One of the more interesting AWS Big Data Services is Amazon Athena. Athena can process S3 data in a few seconds. One of the ways I like using it is to look for patterns in ALB access logs.
AWS provides a detailed instruction on how to setup Athena on how to setup ALB access logs. I’m not going to recap the configuration in this blog article, but share 3 of my favorite queries.
What is the most visited page by the client and total traffic on my website:
SELECT sum(received_bytes) as total_received, sum(sent_bytes) as total_sent, client_ip,
count(client_ip) as client_requests, request_url
FROM alb_logs
GROUP BY client_ip, request_url
ORDER BY total_sent desc;
How long does it take to process requests on average?
SELECT sum(request_processing_time) as request_pt, sum(target_processing_time) as target_pt,
sum (response_processing_time) respone_pt,
sum(request_processing_time + target_processing_time + response_processing_time) as total_pt,
count(request_processing_time) as total_requests,
sum(request_processing_time + target_processing_time + response_processing_time) / count(request_processing_time) as avg_pt,
request_url, target_ip
FROM alb_logs WHERE target_ip <> ''
GROUP BY request_url, target_ip
HAVING COUNT (request_processing_time) > 4
ORDER BY avg_pt desc;
This last one is looking for requests the site doesn’t process. It’s usually some person trying to find some vulnerable PHP code.
SELECT count(client_ip) as client_requests, client_ip, target_ip, request_url,
target_status_code
FROM alb_logs
WHERE target_status_code not in ('200','301','302','304')
GROUP BY client_ip, target_ip, request_url, target_status_code
ORDER BY client_requests desc;
Athena is a serverless tool, and it sets up in seconds and the charges based on TB scanned with a 10MB minimum for the query.