Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some mistakes in the data. #5

Open
liyongkang123 opened this issue Oct 16, 2021 · 2 comments
Open

some mistakes in the data. #5

liyongkang123 opened this issue Oct 16, 2021 · 2 comments

Comments

@liyongkang123
Copy link

Hello, I found some mistakes in the data.
I use the mat file provided here and the full sign-in file for data processing. In the mat file, one item is selected_venue_IDs(venue_index) => venue IDs

However, I found that only the venue id in NYC is correct, and the venue written in other files does not belong to this city.
For example, the first venue id in TKY.mat is 4b11a2c7f964a5200e8123e3,
But by querying raw_poi.txt and api of foursquare, this poi belongs to Tucson, United States.
it does not belong to Tokyo, Japan,
This type of POI appears in the city mat file but does not belong to the city also occurs in other mat files, such as SaoPaulo.mat
I really hope you can explain it here,
thank you very much

@dingqi
Copy link
Contributor

dingqi commented Oct 18, 2021

Thx for your interest. I have fixed the issue in the mat file for a while. I think you are probably using an old version of the mat file. Now the venue ID in the mat file can be mapped to the raw dataset.

The issue is caused by the mismatching between venue ID dictionaries processed at different times (we have a venue dictionary for every six months). BTW, this does not affect any embedding learning results for individual city, because for each dataset, its venue ID/index is obtained for the same venue dictionary.

@liyongkang123
Copy link
Author

Thx for your interest. I have fixed the issue in the mat file for a while. I think you are probably using an old version of the mat file. Now the venue ID in the mat file can be mapped to the raw dataset.

The issue is caused by the mismatching between venue ID dictionaries processed at different times (we have a venue dictionary for every six months). BTW, this does not affect any embedding learning results for individual city, because for each dataset, its venue ID/index is obtained for the same venue dictionary.

Thank you very much for your answer, I did notice that I used the old version of the file, now everything is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants