Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a stamp/handshaking to let automated client know the data for a new relase has finished uploading #161

Open
hroongtatrip opened this issue May 22, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@hroongtatrip
Copy link

While each release of overture data in s3 is under a dated folder-version for ex 2024-05-16-beta.0/
the contents of the parquet file get uploaded at diff time
It would be helpful to have a "DONE" signal in form of a empty file so that clients can trigger ingesting the data.

@jwass
Copy link
Contributor

jwass commented Jun 6, 2024

@hroongtatrip I know that Spark or Hadoop can write _success files which are empty but are created once all files are written. We could do something similar I'd think.

@hroongtatrip
Copy link
Author

yes that would work. Just some singal in form of a 0 byte file when all is done. Thanks

@jwass
Copy link
Contributor

jwass commented Jun 6, 2024

@varapmsft @ibnt1 Any thoughts here? I think Spark can be configured to write this file automatically. Or we could do it manually. We'd have to be careful if an entire dataset is copied that it's written once all other files are.

@jwass
Copy link
Contributor

jwass commented Jun 6, 2024

@ibnt1 pointed out that some tools might crash on the presence of additional _success (and similar) files. We should at least check: Athena, Duckdb, pyarrow

@jenningsanderson made a simple script https://github.com/OvertureMaps/data/blob/main/utils/fetch-releases-from-s3.py whose result can be published. Then you check that file periodically for new data. This would allow new data to land in the right folder and for some testing to take place before publishing that file as "officially" released.

@hroongtatrip
Copy link
Author

the _sucess does not need to be in same folder. Is that the other script does? if so that would work too.

@atiannicelli atiannicelli added the enhancement New feature or request label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants