Skip to content

Latest commit

 

History

History
75 lines (55 loc) · 4.69 KB

README_en.md

File metadata and controls

75 lines (55 loc) · 4.69 KB



A project that seeks to democratize and complement investigative journalism and fact-checking.
Arquivo.pt for justice, journalism and truth.


Um projeto que procura democratizar e complementar o jornalismo de investigação e a verificação de factos.
Arquivo.pt para justiça, jornalismo e verdade.

About the Project

Desarquivo is designed as a reproducible effort based on a set of configurations from which we highlight:

  • The first investigated entities, the ones that lead to the subsequent network expansion. In this version, these are José Sócrates (ex prime-minister of Portugal) and Isabel dos Santos (from Luanda Leaks)
  • The period of time to include in the current version, which is between the year 2000 and 2020
  • The newspapers to search - in this version: Público, Expresso, Diário de Notícias, Correio da Manhã, Sol, Visão and Jornal de Notícias

The collected news were analysed and the entities they mentioned were identified (people, organizations, places, and others) along with their links. These links form an immense network which is now exploitable in this graphical interface, or directly in the open sourced raw data.

Desarquivo rests on two databases, namely MongoDB (NoSQL) and neo4j (Graphs).


Presentation video (only in Portuguese)

Citizens

Can accessdesarquivo and explore its different functionalities and examples.

Researchers

Can access our available datasets and run more complex queries on the generated graphs.

Building Desarquivo

Desarquivo is a puzzle with many pieces, as described below.

Data Collection and Preparation

The code for this piece is available in the collection folder. It is related to the interaction with the Arquivo.pt APIs and with the subsequent organization of data in the MongoDB database. It should be noted that this process runs many tasks in parallel, in practice, this means a reduction of over one order of magnitude to the total data collection time. Other details are explained in the mentioned folder.

API

The API is built on Flask and all its code is available in the api folder. This code interacts with both of our databases (MongoDB and neo4j).

The Interface

The Interface, developed in Vue.js with Nuxt.js and Vuetify, and also eith the cytoscape.js library for the graph visualization. All the code for the interface is in the ui folder. The interface is also ready to be automatically deployed to production with gh-pages.

Docker

Excluding the collection process and the interface, all the remaining parts of Desarquivo (API, MongoDB, neo4j) can ve found in Docker containers, meaning there is a high flexibility in the development and production phases. The most important commands for the orchestration of these services are:

  • docker-compose up -d
  • docker-compose down

It should be noted that, at the moment, if the project is executed on Windows it is necessary to deactivate the volume in the mongodb service.

Future of Desarquivo

Desarquivo will continue being improved and can grow into a more comprehensive tool that stands for transparency, freedom of speech, and journalistic investigation. The possibilities are many, and the ideas too. If you relate to these project and believe in it, we ask you to contribute with time, advice, or ideas.

To get in touch with me, please use LinkedIn.

We welcome all suggestions and bugs. For that, please use the issues page.