-
Notifications
You must be signed in to change notification settings - Fork 0
Basic Usage
- Overview
- Dataset information setup
- Repository name consistency check
- Manual calculation of stats for a month
- Restart after failure
- Manual storing of report files on GitHub
- Creating issues on GitHub for stored reports
The usagestats service can be turned off to avoid abuse and keep usage costs down. The service can be managed at
Base URL: http://<version>.tools-usagestats.vertnet-portal.appspot.com/
Currently available versions:
prod
-
dev
(not deployed)
How to build URLs properly:
<Base URL>/<family>/<command>
Needed to ensure consistency and proper functioning. The process of extracting the reports relies on having information of all datasets already in the datastore.
Family: admin/setup
Command: datasets
Method: POST
Parameters (bold means mandatory): None
curl -i -X POST -d "" http://tools-usagestats.vertnet-portal.appspot.com/admin/setup/datasets
Needed to ensure consistency and proper functioning. Resource repositories on GitHub must be properly and uniquely referenced in the Carto registry table (resource_staging). Otherwise, the application won't be able to adequately store the report or send a notification issue.
Family: admin/tools
Command: repo_checker
Method: GET
or POST
Parameters (bold means mandatory): None
Example:
curl -i -X GET http://tools-usagestats.vertnet-portal.appspot.com/admin/tools/repo_checker
Result: If successful, a JSON-like message; {"result": "success"}
. Otherwise, a list of all the mismatching repository/resource names
Family: admin/parser
Command: init
Method: POST
Parameters (bold means mandatory):
-
period
: the period to process (i.e., the month to calculate) in the format 'YYYYMM'. For example,period=201811
for November 2018 usage statistics. -
force
: true/false, override existing data for the given period. Defaults to False. -
testing
: true/false, use theVertNet/statReports
testing repository instead of the repositories of the publishers to store reports and send issues. Defaults to False. -
github_store
: true/false, store txt versions of the reports in GitHub repositories. Defaults to False. -
github_issue
: true/false, create new issues in GitHub to notify of report availability. Defaults to False. -
table_name
: the name of the Carto table to query in order to extract usage data. Defaults toquery_log_master
. Note: Iftable_name
is provided, no time constraints will be applied to the content of the resulting reports unless the table name is the same as the default. Therefore, the table with the given name must contain all and nothing but the records for the reporting period.
curl -i -X POST -d "period=201204" http://tools-usagestats.vertnet-portal.appspot.com/admin/parser/init
curl -i -X POST -d "period=201204&force=true" http://tools-usagestats.vertnet-portal.appspot.com/admin/parser/init
NOTE: This one does not correctly pass the period on to the github_storage step. Do init, github_store and github_issue separately.
curl -i -X POST -d "period=201604&github_store=true&github_issue=true" http://tools-usagestats.vertnet-portal.appspot.com/admin/parser/init
curl -i -X POST -d "period=201204&table_name=query_log_201204" http://tools-usagestats.vertnet-portal.appspot.com/admin/parser/init
Thanks to the modular structure of the application, one can easily restart the process at almost any point without pain. Each step has its own request handler that can be called directly passing the required parameters. Besides, for long-running tasks like storing all reports on GitHub, the process enters a loop that processes batches of reports until it finishes or receives a DeadlineExceededError from App Engine. In the latter case, it respawns taking the last (unfinished) batch of reports.
Let's assume you forgot to call the datasets
setup process and launched the main task with an incomplete collection of dataset information. This is what would likely happen:
-
init
would end successfully -
get_events
would end successfully, extracting the events to parse -
process_events
would end successfully, storing the reports in the datastore -
github_store
would fail eventually, when trying to get info from a dataset that is not in the datastore. The process will halt here. NOTE: As of 2018-10-11, gbifdatasetids found in the logs but not found in the data store are skipped with the Report entities' 'stored' property set to True. This is to avoid endless loops when an incorrect gbifdatasetid was actually stored in the logs and will never appear in the Dataset Entity list.
At this point, you have all reports in the datastore, some of them with the stored
property set to True
and others (the ones after and including the one that failed) set to False
, and all of them with the issue_sent
property set to False
(since this process did not execute yet). The correct way to proceed is as follows:
- Make a request to the datasets setup handler so that the missing data are uploaded to the datastore
- Optionally, remove Report Entities from the Datastore for Datasets that do not exist if those Datasets are in error (this happened for 456058db-f70b-4005-97ad-e08570cf0c56, which is an invalid gbifdatasetid that had been applied to some records in the portal index).
- Make a request to the
github_store
endpoint, passing these two parameters exactly as you wrote them for theinit
request:period
github_issue
The task will restart from the github_store
endpoint and will process only those reports with store = False
. After that, it will continue as usual depending on the value of the github_issue
variable.
Family: admin/parser
Command: github_store
Method: POST
Parameters (bold means mandatory):
-
period
: the period to process (i.e., the month to calculate) in the format 'YYYYMM'. For example,period=201811
for November 2018 usage statistics. -
testing
: true/false, use theVertNet/statReports
testing repository instead of the repositories of the publishers to store reports and send issues. Defaults to False. -
github_issue
: true/false, create new issues in GitHub to notify of report availability. Defaults to False. -
gbifdatasetid
: process the one resource with the given gbifdatasetid
curl -i -X POST -d "period=201204" http://tools-usagestats.vertnet-portal.appspot.com/admin/parser/github_store
Family: admin/parser
Command: github_issue
Method: POST
Parameters (bold means mandatory):
-
period
: the period to process (i.e., the month to calculate) in the format 'YYYYMM'. For example,period=201811
for November 2018 usage statistics. -
testing
: true/false, use theVertNet/statReports
testing repository instead of the repositories of the publishers to store reports and send issues. Defaults to False. -
gbifdatasetid
: process the one resource with the given gbifdatasetid
curl -i -X POST -d "period=201204" http://tools-usagestats.vertnet-portal.appspot.com/admin/parser/github_issue
This repository is part of the VertNet project.
For more information, please check out the project's home page and GitHub organization page