Using Athena to Query ALB Logs

One of the more interesting AWS Big Data Services is Amazon Athena. Athena can process S3 data in a few seconds. One of the ways I like using it is to look for patterns in ALB access logs.

AWS provides a detailed instruction on how to setup Athena on how to setup ALB access logs. I’m not going to recap the configuration in this blog article, but share 3 of my favorite queries.

What is the most visited page by the client and total traffic on my website:

SELECT sum(received_bytes) as total_received, sum(sent_bytes) as total_sent, client_ip, 
count(client_ip) as client_requests, request_url  
FROM alb_logs 
GROUP BY client_ip, request_url  
ORDER BY total_sent  desc;

How long does it take to process requests on average?

SELECT sum(request_processing_time) as request_pt, sum(target_processing_time) as target_pt,
sum (response_processing_time) respone_pt, 
sum(request_processing_time + target_processing_time + response_processing_time) as total_pt, 
count(request_processing_time) as total_requests,
sum(request_processing_time + target_processing_time + response_processing_time) / count(request_processing_time) as avg_pt,
request_url, target_ip
FROM alb_logs WHERE target_ip <> ''
GROUP BY request_url, target_ip 
HAVING COUNT (request_processing_time) > 4 
ORDER BY avg_pt desc;

This last one is looking for requests the site doesn’t process. It’s usually some person trying to find some vulnerable PHP code.

SELECT count(client_ip) as client_requests, client_ip, target_ip, request_url, 
target_status_code 
FROM alb_logs 
WHERE target_status_code not in ('200','301','302','304') 
GROUP BY client_ip, target_ip, request_url, target_status_code
ORDER BY client_requests desc; 

Athena is a serverless tool, and it sets up in seconds and the charges based on TB scanned with a 10MB minimum for the query.

One of the more interesting AWS Big Data Services is Amazon Athena. Athena can process S3 data in a few seconds. One of the ways I like using it is to look for patterns in ALB access logs.

AWS provides a detailed instruction on how to setup Athena on how to setup ALB access logs. I’m not going to recap the configuration in this blog article, but share 3 of my favorite queries.

What is the most visited page by the client and total traffic on my website:

SELECT sum(received_bytes) as total_received, sum(sent_bytes) as total_sent, client_ip, 
count(client_ip) as client_requests, request_url  
FROM alb_logs 
GROUP BY client_ip, request_url  
ORDER BY total_sent  desc;

How long does it take to process requests on average?

SELECT sum(request_processing_time) as request_pt, sum(target_processing_time) as target_pt,
sum (response_processing_time) respone_pt, 
sum(request_processing_time + target_processing_time + response_processing_time) as total_pt, 
count(request_processing_time) as total_requests,
sum(request_processing_time + target_processing_time + response_processing_time) / count(request_processing_time) as avg_pt,
request_url, target_ip
FROM alb_logs WHERE target_ip <> ''
GROUP BY request_url, target_ip 
HAVING COUNT (request_processing_time) > 4 
ORDER BY avg_pt desc;

This last one is looking for requests the site doesn’t process. It’s usually some person trying to find some vulnerable PHP code.

SELECT count(client_ip) as client_requests, client_ip, target_ip, request_url, 
target_status_code 
FROM alb_logs 
WHERE target_status_code not in ('200','301','302','304') 
GROUP BY client_ip, target_ip, request_url, target_status_code
ORDER BY client_requests desc; 

Athena is a serverless tool, and it sets up in seconds and the charges based on TB scanned with a 10MB minimum for the query.