Problem:- Cloudfront logs are stored in the following format
distributionid-year-month-date-hour.gz
So if you are looking to analyse these logs you need something similar to the Athena which can directly run your queries over the s3 bucket which is storing these logs.
But Athena requires partition data which simply means storing data in a format of (e.g. a folder structure). This allows you to restrict the athena to the limited data which you want to analyze other by default it will take the entire data and cost you more while reading GBs of data which you dont want.
By default Athena tries to "read all" the data. But if you have partitioned it like year/month/day than you can register it like
year=2021/month=02/day=25 -- s3://logs/2021/02/25
This allows your to simply use the where clause and with partition indices to restrict the athena to read the data you are interested in
SELECT uri, count(1)
FROM cloudfront_logs
WHERE status = 404
AND (year || month || day || hour) > ‘20200225’