Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in the DSA1 implementation #67

Open
shawnmjones opened this issue Feb 3, 2022 · 0 comments
Open

Fix typo in the DSA1 implementation #67

shawnmjones opened this issue Feb 3, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@shawnmjones
Copy link
Member

After reworking Hypercane to use '.halg' formatted files as part of the IIPC 2021 Grant work, the DSA1 algorithm implementation is now wrong. We execute the time slice twice instead of the DBSCAN step:

# prevent extra work if we already have it from previous runs
if [ ! -e ${TIME_SLICE_FILE} ]; then
echo "clustering mementos from remainder by time"
hc cluster time-slice -i mementos -a ${ONLY_ENGLISH_FILE} -o ${TIME_SLICE_FILE} -l ${TIME_SLICE_LOG}
fi
# apply DBSCAN to cluster by Simhash distance
DBSCAN_FILE=${WORKING_DIRECTORY}/dsa1-dbscan.tsv
DBSCAN_LOG=${WORKING_DIRECTORY}/dsa1-cluster-dbscan.log
# prevent extra work if we already have it from previous runs
if [ ! -e ${DBSCAN_FILE} ]; then
echo "clustering mementos from remainder by Simhash"
hc cluster time-slice -i mementos -a ${TIME_SLICE_FILE} -o ${DBSCAN_FILE} -l ${DBSCAN_LOG}
fi

It needs to follow AlNoamany's Algorithm again, like it did while working on my dissertation work.

@shawnmjones shawnmjones self-assigned this Feb 3, 2022
@shawnmjones shawnmjones added enhancement New feature or request bug Something isn't working and removed enhancement New feature or request labels Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant