Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli/delete-cache: Switch to fastwalk #90

Merged
merged 3 commits into from
Sep 12, 2024
Merged

cli/delete-cache: Switch to fastwalk #90

merged 3 commits into from
Sep 12, 2024

Conversation

joebonrichie
Copy link
Contributor

@joebonrichie joebonrichie commented Mar 16, 2024

Use a parallelized version to fileWalk to get directory sizes. Additionally, use the parallelized version of fileWalk to delete files, then call RemoveAll() on the remaining empty directories.

Resolves #89.

Benchmarks
solbuild dc -a (720.1 MiB)

Before: 0m0.174s | Now: 0m0.081s (uncached)

solbuild dc -s (41.8 GiB)

Before : 0m15.849s (uncached) | 0m6.241s (cached)
After : 0m2.134s (uncached) | 0m0.585s (cached)

@joebonrichie joebonrichie changed the title cli/delete-cache: Use parallelized version of filepath.Walk cli/delete-cache: Go fast Mar 16, 2024
@joebonrichie joebonrichie force-pushed the faster-walk branch 5 times, most recently from 457e1c4 to 85b1dfc Compare March 16, 2024 20:56
@joebonrichie
Copy link
Contributor Author

I see there are a couple of alternatives which have been maintained more recently:

https://github.com/saracen/walker
https://github.com/charlievieth/fastwalk

Will report back

@joebonrichie joebonrichie changed the title cli/delete-cache: Go fast cli/delete-cache: Switch to fastwalk Mar 20, 2024
@ermo
Copy link
Contributor

ermo commented May 3, 2024

@joebonrichie Appears to need a rebase. ^^'

@ermo ermo added this to the 1.7.0 milestone May 3, 2024
@ermo ermo added the enhancement New feature or request label May 3, 2024
Keeps the same API usage, just faster.

Resolves #89.

solbuild dc -s ('41.8 GiB')

Before : 0m15.849s (uncached) | 0m6.241s (cached)
After  : 0m2.134s (uncached)  | 0m0.585s (cached)
Make use of the parallelized walk.Walk() function instead of RemoveAll()
to remove files, this is significantly quicker. Additionally, get the
directory size as we're deleting so we only run walk.Walk() once. Left over
directories are still removed with RemoveAll() after all files have been
deleted.

If we could tweak the file walk to search depth first, we could append
all directories to an array and then delete them with Remove() after all files
have been removed which would be slightly faster. However, that is a micro
optimization for the future.

Before: 0m0.174s | Now: 0m0.081s (uncached, 720.1 MiB)
Copy link
Member

@silkeh silkeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@silkeh silkeh merged commit 125a443 into master Sep 12, 2024
1 check passed
@EbonJaeger EbonJaeger deleted the faster-walk branch September 12, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace filepath.Walk usage with fast parallel version
3 participants