Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip uploading unchanged files #23

Open
pimterry opened this issue Oct 4, 2022 · 4 comments
Open

Skip uploading unchanged files #23

pimterry opened this issue Oct 4, 2022 · 4 comments

Comments

@pimterry
Copy link

pimterry commented Oct 4, 2022

Since you fixed my previous issue so quickly, I thought it might be worth adding a more complicated one 😄.

I have a fairly large site (325MB - it includes some videos etc) that I want to deploy to BunnyCDN and it can take a while. I'd like to speed this up.

I think there's a good approach to do this by extending the approach here:

  • Before uploading, query the existing content via https://docs.bunny.net/reference/get_-storagezonename-path-
    • This returns only one level, so we'd need to recurse into subdirectories to collect everything. Those subdirectory requests could happen in parallel though.
  • Store the 'Checksum' field returned for each existing file path.
    • This is the SHA256 checksum for the file, uppercased.
  • When uploading files, check whether the local checksum already matches the remote checksum for each file, and if so skip it.

This would be useless when used with remove (since that clears the existing content first) but it's technically possible to extend this later to do removal en route as well (by comparing the existing content with the uploaded files, and individually deleting any extra content afterwards).

My best-guess calculations suggest this would reduce the time to do large deploys like mine with only small changes from ~10 minutes to ~10 seconds (i.e. 60x faster).

Would you be open to that? I'm happy to open a PR to add this feature myself, if that's something you might accept.

@ayeressian
Copy link
Owner

hmm... checksum for the file path or content? If for path what happens if the file content changes? If for content then downloading the files will negate the gains of not uploading the files.

@pimterry
Copy link
Author

pimterry commented Oct 4, 2022

It's the checksum of the file content, not the path.

Downloading would negate this, but you don't have to download the files to calculate the hash. Bunny.net has already calculated the checksums for every file, and exposes them via the API. You can query the "List files" API endpoint above to read the checksum for every single file in a directory in a single request.

The response looks like this:

Screenshot from 2022-10-04 18-54-27

@ayeressian
Copy link
Owner

Oh, I see. Makes sense. I would be happy if you provide a PR for that. Let me know if you have any questions.

@pimterry
Copy link
Author

pimterry commented Oct 5, 2022

Sorry, I'm actually not going to open a PR for this - I still think it's quite doable, but after running into other issues with Bunny Storage, I've refactored my deployment strategy to publish content directly to a backend server instead, and connect a pull zone to that to populate the CDN, without using storage at all. Sorry! I'll leave this open anyway in case anybody else is interested in taking this on in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants