C++ implementation of the PageRank algorithm (Google) by Sergey Brin and Lawrence Page and the HITS algorithm (Ask.com) by Jon Kleinberg, that uses a CSR (Compressed Sparse Row) matrix and mmap to minimize memory usage.
This script computes the top-k nodes based on the rankings of the two algorithms, adds the in-degree of the node, and calculates the Jaccard coefficient between the results.
This project is the assignment for the course Information Retreival and Web Search 2021/2022.
See Assignment.pdf for more details on the assignment.
See Report.pdf for more details on the implementation.
The assets are from the SNAP Datasets by Jure Leskovec and Andrej Krevl, available here.
mkdir build
cd build
cmake ..
make
./pagerank-hits [filename] [topK] [dampingFactor]
Find top-K results of each algorithm in the graph and the Jaccard's coeffiecient between the algorithms.
OPTIONS:
filename The graph to process (default: assets/web-NotreDame.txt)
topK Set top-K number (default: 20)
dampingFactor Set PageRank's damping factor (default: 0.85)
If you run ./pagerank-hits
without arguments it will run with the default arguments, so it's the same as writing:
./pagerank-hits assets/web-NotreDame.txt 20 0.85