Skip to content

Commit

Permalink
Add azure storage account export target (#11)
Browse files Browse the repository at this point in the history
* Add .vscode to .gitignore

* Extract data_types, ignore_keys, rename_cols into separate files to prepare for using ConfigMaps

* Added azure dependencies

* Fixed wrong content for allocation keys

* Implement AZURE specific env vars, create new test for azure, added factory pattern for storage backend

* Run & fix all pylint issues

* aws_s3_storage: change to original upload procedure, azure_storage: use to_parquet

* Remove unnecessary print statements

* Added new ENV vars + respective tests. Implemented mechanism for conditional adding of query parameters

* Adding tests for load_config_file

* Added environment variable for json_normalize separator char.

* Added new ENV vars to README. Added section for required permissions on Storage Account and S3

* Added short docs on necessary Azure permissions

* Change back to original window param

* Add files to Dockerfile as per review.

* Add 'command' field to allow for changes in Dockerfiles's 'ENTRYPOINT'.
  • Loading branch information
cklingspor authored Sep 7, 2024
1 parent 6ee0c91 commit 1c0e817
Show file tree
Hide file tree
Showing 16 changed files with 550 additions and 135 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,6 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# VSCode
.vscode
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ RUN apt-get update && apt-get -y upgrade && apt-get -y clean
RUN useradd --create-home --shell /bin/sh --uid 8000 opencost
COPY --from=builder /app /app
COPY src/opencost_parquet_exporter.py /app/opencost_parquet_exporter.py
COPY src/data_types.json /app/data_types.json
COPY src/rename_cols.json /app/rename_cols.json
COPY src/ignore_alloc_keys.json /app/ignore_alloc_keys.json
RUN chmod 755 /app/opencost_parquet_exporter.py && chown -R opencost /app/
USER opencost
ENV PATH="/app/.venv/bin:$PATH"
Expand Down
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,26 @@ The script supports the following environment variables:
* OPENCOST_PARQUET_FILE_KEY_PREFIX: This is the prefix used for the export, by default it is '/tmp'. The export is going to be saved inside this prefix, in the following structure: year=window_start.year/month=window_start.month/day=window_start.day , ex: tmp/year=2024/month=1/date=15
* OPENCOST_PARQUET_AGGREGATE: This is the dimentions used to aggregate the data. by default we use "namespace,pod,container" which is the same dimensions used for the CSV native export.
* OPENCOST_PARQUET_STEP: This is the Step for the export, by default we use 1h steps, which result in 24 steps in a day and make easier to match the exported data to AWS CUR, since cur also export on hourly base.
* OPENCOST_PARQUET_RESOLUTION: Duration to use as resolution in Prometheus queries. Smaller values (i.e. higher resolutions) will provide better accuracy, but worse performance (i.e. slower query time, higher memory use). Larger values (i.e. lower resolutions) will perform better, but at the expense of lower accuracy for short-running workloads.
* OPENCOST_PARQUET_ACCUMULATE: If `"true"`, sum the entire range of time intervals into a single set. Default value is `"false"`.
* OPENCOST_PARQUET_INCLUDE_IDLE: Whether to return the calculated __idle__ field for the query. Default is `"false"`.
* OPENCOST_PARQUET_IDLE_BY_NODE: If `"true"`, idle allocations are created on a per node basis. Which will result in different values when shared and more idle allocations when split. Default is `"false"`.
* OPENCOST_PARQUET_STORAGE_BACKEND: The storage backend to use. Supports `aws`, `azure`. See below for Azure specific variables.
* OPENCOST_PARQUET_JSON_SEPARATOR: The OpenCost API returns nested objects. The used [JSON normalization method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html) allows for a custom separator. Use this to specify the separator of your choice.

## Azure Specific Environment Variables
* OPENCOST_PARQUET_AZURE_STORAGE_ACCOUNT_NAME: Name of the Azure Storage Account you want to export the data to.
* OPENCOST_PARQUET_AZURE_CONTAINER_NAME: The container within the storage account you want to save the data to. The service principal requires write permissions on the container
* OPENCOST_PARQUET_AZURE_TENANT: You Azure Tenant ID
* OPENCOST_PARQUET_AZURE_APPLICATION_ID: ClientID of the Service Principal
* OPENCOST_PARQUET_AZURE_APPLICATION_SECRET: Secret of the Service Principal

# Prerequisites
## AWS IAM

## Azure RBAC
The current implementation allows for authentication via [Service Principals](https://learn.microsoft.com/en-us/entra/identity-platform/app-objects-and-service-principals?tabs=browser) on the Azure Storage Account. Therefore, to use the Azure storage backend you need an existing service principal with according role assignments. Azure RBAC has built-in roles for Storage Account Blob Storage operations. The [Storage-Blob-Data-Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/storage#storage-blob-data-contributor) allows to write data to a Azure Storage Account container. A less permissivie custom role can be built and is encouraged!


# Usage:

Expand Down
1 change: 1 addition & 0 deletions examples/k8s_cron_job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ spec:
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
command: ["/app/.venv/bin/python3"] # Update this is if the ENTRYPOINT changes
dnsConfig:
options:
- name: single-request-reopen
Expand Down
2 changes: 2 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ pytz==2023.3.post1
six==1.16.0
tzdata==2023.4
pyarrow==14.0.1
azure-storage-blob==12.19.1
azure-identity==1.15.0
# The dependencies bellow are only used for development.
freezegun==1.4.0
pylint==3.0.3
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ pytz==2023.3.post1
six==1.16.0
tzdata==2023.4
pyarrow==14.0.1
azure-storage-blob==12.19.1
azure-identity==1.15.0
39 changes: 39 additions & 0 deletions src/data_types.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{
"cpuCoreHours": "float",
"cpuCoreRequestAverage": "float",
"cpuCoreUsageAverage": "float",
"cpuCores": "float",
"cpuCost": "float",
"cpuCostAdjustment": "float",
"cpuEfficiency": "float",
"externalCost": "float",
"gpuCost": "float",
"gpuCostAdjustment": "float",
"gpuCount": "float",
"gpuHours": "float",
"loadBalancerCost": "float",
"loadBalancerCostAdjustment": "float",
"networkCost": "float",
"networkCostAdjustment": "float",
"networkCrossRegionCost": "float",
"networkCrossZoneCost": "float",
"networkInternetCost": "float",
"networkReceiveBytes": "float",
"networkTransferBytes": "float",
"pvByteHours": "float",
"pvBytes": "float",
"pvCost": "float",
"pvCostAdjustment": "float",
"ramByteHours": "float",
"ramByteRequestAverage": "float",
"ramByteUsageAverage": "float",
"ramBytes": "float",
"ramCost": "float",
"ramCostAdjustment": "float",
"ramEfficiency": "float",
"running_minutes": "float",
"sharedCost": "float",
"totalCost": "float",
"totalEfficiency": "float"
}

3 changes: 3 additions & 0 deletions src/ignore_alloc_keys.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"keys": ["pvs", "lbAllocations"]
}
Loading

0 comments on commit 1c0e817

Please sign in to comment.