Problem:- Cloudfront logs are stored in the following format
distributionid-year-month-date-hour.gz
So if you are looking to analyse these logs you need something similar to the Athena which can directly run your queries over the s3 bucket which is storing these logs.
But Athena requires partition data which simply means storing data in a format of (e.g. a folder structure). This allows you to restrict the athena to the limited data which you want to analyze other by default it will take the entire data and cost you more while reading GBs of data which you dont want.
By default Athena tries to "read all" the data. But if you have partitioned it like year/month/day than you can register it like
year=2021/month=02/day=25 -- s3://logs/2021/02/25
This allows your to simply use the where clause and with partition indices to restrict the athena to read the data you are interested in
SELECT uri, count(1)
FROM cloudfront_logs
WHERE status = 404
AND (year || month || day || hour) > ‘20200225’
Solution:-
For setting this up you need to first create the Cloudfront distribution which delivers the logs to the S3 bucket by default.
Nex create a lambda called cdn-log-restructured in the python2.7 with Handler as lambda_function.lambda_handler and use the below code for the lambda
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
print("Restructuring s3 path for cloudfront logs")
# Iterate over all records in the list provided
for record in event['Records']:
# Get the S3 bucket
bucket = record['s3']['bucket']['name']
# Get the source S3 object key
key = record['s3']['object']['key']
# Get just the filename of the source S3 object, increase to 2 if use distro
filename = key.split('/')[1]
#print("f: %s" % filename)
# Get the yyyy-mm-dd-hh from the source S3 object
dateAndHour = filename.split('.')[1].split('/')[0]
#print(dateAndHour)
year, month, day, hour = dateAndHour.split('-')
# Create destination path
dest = 'test/{}/{}/{}/{}'.format(
year, month, day, filename
)
# Display source/destination in Lambda output log
print("- src: s3://%s/%s" % (bucket, key))
print("- dst: s3://%s/%s" % (bucket, dest))
# Perform copy of the S3 object
s3.copy_object(Bucket=bucket, Key=dest, CopySource=bucket + '/' + key)
# Delete the source S3 object
# Disable this line if a copy is sufficient
s3.delete_object(Bucket=bucket, Key=key)
Also you would need to create a lambda role with the IAM policy as below
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*",
"s3:PutObject",
"s3:PutObjectTagging",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::cloufront-logs",
"arn:aws:s3:::cloudfront-logs/test/*"
]
}
]
}
Go to the s3 bucket(Cloudfront-logs)---> Create a new event notification---> set prefix(test)---> in destination select (Lambda Function) ----> Enter lambda function ARN (arn:aws:lambda:us-west-2:XXXXXXXXXXX:function:cdn-log-restructured)
This will create a Trigger on the s3 bucket so that whenever a new log file is delivered to the s3 bucket by cloudfront it automatically triggers a lambda and creates a new folder structure as
%3CmxGraphModel%3E%3Croot%3E%3CmxCell%20id%3D%220%22%2F%3E%3CmxCell%20id%3D%221%22%20parent%3D%220%22%2F%3E%3CmxCell%20id%3D%222%22%20style%3D%22edgeStyle%3DorthogonalEdgeStyle%3Brounded%3D0%3BorthogonalLoop%3D1%3BjettySize%3Dauto%3Bhtml%3D1%3B%22%20edge%3D%221%22%20source%3D%223%22%20parent%3D%221%22%3E%3CmxGeometry%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22360%22%20y%3D%22150%22%20as%3D%22targetPoint%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%223%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22200%22%20y%3D%2240%22%20width%3D%22320%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%224%22%20value%3D%22%26lt%3Bfont%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3BCloudfront%26lt%3B%2Ffont%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22330%22%20y%3D%2260%22%20width%3D%2240%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%225%22%20style%3D%22edgeStyle%3DorthogonalEdgeStyle%3Brounded%3D0%3BorthogonalLoop%3D1%3BjettySize%3Dauto%3Bhtml%3D1%3BentryX%3D0.5%3BentryY%3D0%3BentryDx%3D0%3BentryDy%3D0%3B%22%20edge%3D%221%22%20source%3D%226%22%20target%3D%228%22%20parent%3D%221%22%3E%3CmxGeometry%20relative%3D%221%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%226%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22200%22%20y%3D%22151%22%20width%3D%22320%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%227%22%20value%3D%22%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26amp%3Bnbsp%3BS3%20Bucket%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bcolumn-Name%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bobject-link%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bname%20object%20latest%20object-name%26quot%3B%26gt%3BE2X8QTMLCYW88L.2021-02-24-05.67e5c8b7.gz%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22210%22%20y%3D%22171%22%20width%3D%22290%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%228%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22200%22%20y%3D%22260%22%20width%3D%22320%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%229%22%20value%3D%22%26lt%3Bfont%20style%3D%26quot%3Bfont-size%3A%2012px%26quot%3B%26gt%3B%26amp%3Bnbsp%3B%26lt%3B%2Ffont%26gt%3B%26lt%3Bfont%26gt%3B%26lt%3Bfont%20style%3D%26quot%3Bfont-size%3A%2022px%26quot%3B%26gt%3BLAMBDA%26lt%3B%2Ffont%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bcolumn-Name%26quot%3B%20style%3D%26quot%3Bfont-size%3A%2012px%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bobject-link%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bname%20object%20latest%20object-name%26quot%3B%26gt%3B%26lt%3Bspan%26gt%3Bcdn-log-restructured%26lt%3B%2Fspan%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Ffont%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22210%22%20y%3D%22280%22%20width%3D%22290%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2210%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%2290%22%20y%3D%22360%22%20width%3D%22530%22%20height%3D%2290%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2211%22%20value%3D%22%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26amp%3Bnbsp%3BS3%20Bucket%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bcolumn-Name%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bobject-link%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bname%20object%20latest%20object-name%26quot%3B%26gt%3Bstructured%2FE2X8QTMLCYW88L%2F2021%2F02%2F24%2F05%2F%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3BE2X8QTMLCYW88L.2021-02-24-05.67e5c8b7.gz%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%2260%22%20y%3D%22350%22%20width%3D%22590%22%20height%3D%2290%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2212%22%20style%3D%22edgeStyle%3DorthogonalEdgeStyle%3Brounded%3D0%3BorthogonalLoop%3D1%3BjettySize%3Dauto%3Bhtml%3D1%3B%22%20edge%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22360%22%20y%3D%22321%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22360%22%20y%3D%22360%22%20as%3D%22targetPoint%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2213%22%20value%3D%22logs%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22360%22%20y%3D%22110%22%20width%3D%2240%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2214%22%20value%3D%22S3%20lambda%20trigger%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22360%22%20y%3D%22220%22%20width%3D%22100%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3C%2Froot%3E%3C%2FmxGraphModel%3E
0 comments:
Post a Comment