Skip to content

Lambda to query Dovetail CloudFront usage and insert into BigQuery

License

Notifications You must be signed in to change notification settings

PRX/dovetail-cdn-usage

Repository files navigation

Dovetail CDN Usage

AWS Lambda to query Dovetail CloudFront usage and insert into BigQuery

Overview

  1. Requests to the Dovetail CDN are logged to an S3 bucket.

  2. This lambda BigQuery-queries for the MAX(day) FROM dt_bytes, and processes days >= the result (or all the way back to the S3 expiration date).

  3. Then we Athena-query for a day of logs, grouping by path and summing bytes sent.

  4. Paths are parsed and grouped as /<podcast>/<episode>/... or /<podcast>/<feed>/episode/.... Unrecognized paths that use a bunch of bandwidth are warning-logged.

  5. Resulting bytes usage is inserted back into BigQuery:

    {day: "2024-04-23", feeder_podcast: 123, feeder_episode: "abcd-efgh", feeder_feed: null, bytes: 123456789}
    

Development

Local development is dependency free! Just:

yarn install
yarn test
yarn lint

However, if you actually want to hit Athena/BigQuery, you'll need to cp env-example .env and fill in several dependencies:

  • ATHENA_DB the athena database you're using
  • ATHENA_TABLE the athena table that has been configured to query to the Dovetail CDN S3 logs
    • NOTE: you must have your AWS credentials setup and configured locally to reach/query Athena
  • BQ_DATASET the BigQuery dataset to load the dt_bytes table in. You should use development or something locally (not staging or production)

Then run yarn start and you're off!

Deployment

This function's code is deployed as part of the usual PRX CI/CD process. The lambda zip is built via yarn build, uploaded to S3, and deployed into the wild.

While that's all straightforward, there are some gotchas setting up access:

  1. AWS permissions are (Athena, S3, Glue, etc) are documented in the Cloudformation Stack for this app.
  2. Google is configured via the BQ_CLIENT_CONFIG ENV and Federated Access
  3. In addition to the steps documented in (2), the Service Account you create must have the following permissions:
    • BigQuery Job User in your BigQuery project
    • Any role on the BigQuery dataset that provides bigquery.tables.create, so the table load jobs can execute. We have a custom role to provide this minimal access, but any role with that create permission will work.
    • BigQuery Data Editor only on the dt_bytes table in the dataset for this environment (click the table name in BigQuery UI -> Share -> Manage Permissions)

License

AGPL-3.0 License

About

Lambda to query Dovetail CloudFront usage and insert into BigQuery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published