## Augur Huge tool to build database of info from Git repositories. [Link to resulting schema](https://oss-augur.readthedocs.io/en/main/schema/toc.html). It has Docker containers which run a database and API. Repositories can be added using their git ID ([docs](https://oss-augur.readthedocs.io/en/main/getting-started/command-line-interface/db.html#add-repos)). Can it be used for repositories that I do not have e.g. push access to? ## GrimoireLab [Link](https://chaoss.github.io/grimoirelab/) Component [Perceval](https://github.com/chaoss/grimoirelab-perceval): Python API for retrieving data from repository. [Arthur](https://github.com/chaoss/grimoirelab-kingarthur): schedules and executes Perceval for larger amounts of software repositories. Uses Redis queue. ## GH Archive [Link](https://www.gharchive.org/) Record public GitHub timeline, archive it and make it easily accessible. Data is available as raw, hourly JSON encoded events file from `data.gharchive.org`. Moreover, it's on Google BigQuery, which needs Google Developer access but allows SQL-like queries. There is a limit of 1TB data processing per month though.