This repository contains the source code for the paper
A Critical Assessment of State-of-the-Art in Entity Alignment
Max Berrendorf, Ludwig Wacker, and Evgeniy Faerman
https://arxiv.org/abs/2010.16314
Setup and activate virtual environment:
python3.8 -m venv ./venv
source ./venv/bin/activate
Install requirements (in this virtual environment):
pip install -U pip
pip install -U -r requirements.txt
In order to run the DGMC scripts, you additionally need to setup
its requirements as described in the corresponding GitHub repository's
README.
We do not include them into requirements.txt
,
since their installation is a bit more involved, including non-Python dependencies.
In order to track results to a MLFlow server, start it first by running
mlflow server
Note: When storing the result for many configurations, we recommend to setup a database backend following the instructions. For the following examples, we assume that the server is running at
TRACKING_URI=http://localhost:5000
Please download the RDGCN embeddings extracted with the OpenEA codebase
from here
and place them in ~/.kgm/openea_rdgcn_embeddings
.
They have a file name matching the pattern *_*_15K_V2.pt
and require in total around 160MiB storage.
To generate data for the BERT-based initialization, run
(venv) PYTHONPATH=./src python3 executables/prepare_bert.py
We also provide preprocessed files at this url.
If you prefer to use those, please download and place them in ~/.kgm/bert_prepared
.
They have a file name matching *_bert-base-multilingual-cased_*
and require in total around 6.1GiB storage.
For all experiments the results are logged to the running MLFlow instance.
Note: The hyperparameter searches takes a significant amount of time (~multiple days), and requires access to GPU(s). You can abort the script at any time, and inspect the current results via the web interface of MLFlow.
For the zero-shot evaluation run
(venv) PYTHONPATH=./src python3 executables/zero_shot.py --tracking_uri=${TRACKING_URI}
To run the hyperparameter search run
(venv) PYTHONPATH=./src python3 executables/tune_gcn_align.py --tracking_uri=${TRACKING_URI}
To run the hyperparameter search run
(venv) PYTHONPATH=./src python3 executables/tune_rdgcn.py --tracking_uri=${TRACKING_URI}
To run the hyperparameter search run
(venv) PYTHONPATH=./src python3 executables/tune_dgmc.py --tracking_uri=${TRACKING_URI}
To summarize the dataset statistics run
(venv) PYTHONPATH=./src python3 executables/summarize.py --target datasets --force
To summarize all experiments run
(venv) PYTHONPATH=./src python3 executables/summarize.py --target results --tracking_uri=${TRACKING_URI} --force
To generate the ablation study table run
(venv) PYTHONPATH=./src python3 executables/summarize.py --target ablation --tracking_uri=${TRACKING_URI} --force