A project that seeks to democratize and complement investigative journalism and fact-checking.
Arquivo.pt for justice, journalism and truth.
Um projeto que procura democratizar e complementar o jornalismo de investigação e a verificação de factos.
Arquivo.pt para justiça, jornalismo e verdade.
Desarquivo is designed as a reproducible effort based on a set of configurations from which we highlight:
- The first investigated entities, the ones that lead to the subsequent network expansion. In this version, these are
José Sócrates
(ex prime-minister of Portugal) andIsabel dos Santos
(from Luanda Leaks) - The period of time to include in the current version, which is between the year
2000
and2020
- The newspapers to search - in this version: Público, Expresso, Diário de Notícias, Correio da Manhã, Sol, Visão and Jornal de Notícias
The collected news were analysed and the entities they mentioned were identified (people, organizations, places, and others) along with their links. These links form an immense network which is now exploitable in this graphical interface, or directly in the open sourced raw data.
Desarquivo rests on two databases, namely MongoDB (NoSQL) and neo4j (Graphs).
Presentation video (only in Portuguese)
Can accessdesarquivo and explore its different functionalities and examples.
Can access our available datasets and run more complex queries on the generated graphs.
Desarquivo is a puzzle with many pieces, as described below.
The code for this piece is available in the collection folder. It is related to the interaction with the Arquivo.pt APIs and with the subsequent organization of data in the MongoDB database. It should be noted that this process runs many tasks in parallel, in practice, this means a reduction of over one order of magnitude to the total data collection time. Other details are explained in the mentioned folder.
The API is built on Flask and all its code is available in the api folder. This code interacts with both of our databases (MongoDB and neo4j).
The Interface, developed in Vue.js with Nuxt.js and Vuetify, and also eith the cytoscape.js library for the graph visualization. All the code for the interface is in the ui folder. The interface is also ready to be automatically deployed to production with gh-pages.
Excluding the collection process and the interface, all the remaining parts of Desarquivo (API, MongoDB, neo4j) can ve found in Docker containers, meaning there is a high flexibility in the development and production phases. The most important commands for the orchestration of these services are:
docker-compose up -d
docker-compose down
It should be noted that, at the moment, if the project is executed on Windows it is necessary to deactivate the volume in the mongodb service.
Desarquivo will continue being improved and can grow into a more comprehensive tool that stands for transparency, freedom of speech, and journalistic investigation. The possibilities are many, and the ideas too. If you relate to these project and believe in it, we ask you to contribute with time, advice, or ideas.
To get in touch with me, please use LinkedIn.
We welcome all suggestions and bugs. For that, please use the issues page.