Skip uploading unchanged files #23

pimterry · 2022-10-04T16:38:22Z

Since you fixed my previous issue so quickly, I thought it might be worth adding a more complicated one 😄.

I have a fairly large site (325MB - it includes some videos etc) that I want to deploy to BunnyCDN and it can take a while. I'd like to speed this up.

I think there's a good approach to do this by extending the approach here:

Before uploading, query the existing content via https://docs.bunny.net/reference/get_-storagezonename-path-
- This returns only one level, so we'd need to recurse into subdirectories to collect everything. Those subdirectory requests could happen in parallel though.
Store the 'Checksum' field returned for each existing file path.
- This is the SHA256 checksum for the file, uppercased.
When uploading files, check whether the local checksum already matches the remote checksum for each file, and if so skip it.

This would be useless when used with remove (since that clears the existing content first) but it's technically possible to extend this later to do removal en route as well (by comparing the existing content with the uploaded files, and individually deleting any extra content afterwards).

My best-guess calculations suggest this would reduce the time to do large deploys like mine with only small changes from ~10 minutes to ~10 seconds (i.e. 60x faster).

Would you be open to that? I'm happy to open a PR to add this feature myself, if that's something you might accept.

The text was updated successfully, but these errors were encountered:

ayeressian · 2022-10-04T16:51:39Z

hmm... checksum for the file path or content? If for path what happens if the file content changes? If for content then downloading the files will negate the gains of not uploading the files.

pimterry · 2022-10-04T16:57:24Z

It's the checksum of the file content, not the path.

Downloading would negate this, but you don't have to download the files to calculate the hash. Bunny.net has already calculated the checksums for every file, and exposes them via the API. You can query the "List files" API endpoint above to read the checksum for every single file in a directory in a single request.

The response looks like this:

ayeressian · 2022-10-04T17:01:04Z

Oh, I see. Makes sense. I would be happy if you provide a PR for that. Let me know if you have any questions.

pimterry · 2022-10-05T11:59:27Z

Sorry, I'm actually not going to open a PR for this - I still think it's quite doable, but after running into other issues with Bunny Storage, I've refactored my deployment strategy to publish content directly to a backend server instead, and connect a pull zone to that to populate the CDN, without using storage at all. Sorry! I'll leave this open anyway in case anybody else is interested in taking this on in future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip uploading unchanged files #23

Skip uploading unchanged files #23

pimterry commented Oct 4, 2022

ayeressian commented Oct 4, 2022

pimterry commented Oct 4, 2022

ayeressian commented Oct 4, 2022

pimterry commented Oct 5, 2022

Skip uploading unchanged files #23

Skip uploading unchanged files #23

Comments

pimterry commented Oct 4, 2022

ayeressian commented Oct 4, 2022

pimterry commented Oct 4, 2022

ayeressian commented Oct 4, 2022

pimterry commented Oct 5, 2022