Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WORK-IN-PROGRESS] Introduce the path walk API into Git for Windows #5146

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
6354d7a
path-walk: introduce an object walk by path
derrickstolee Aug 29, 2024
c8e08c3
backfill: add builtin boilerplate
derrickstolee Jun 7, 2024
b05a276
backfill: basic functionality and tests
derrickstolee Sep 1, 2024
e02f7b3
backfill: add --batch-size=<n> option
derrickstolee Sep 1, 2024
4236e4f
backfill: add --sparse option
derrickstolee Sep 1, 2024
31c9b45
path-walk: allow consumer to specify object types
derrickstolee Sep 1, 2024
356abc9
backfill: assume --sparse when sparse-checkout is enabled
derrickstolee Sep 1, 2024
3a421ff
path-walk: allow visiting tags
derrickstolee Sep 9, 2024
b9471b6
survey: stub in new experimental `git-survey` command
jeffhostetler Apr 29, 2024
c4b3490
survey: add command line opts to select references
jeffhostetler Apr 29, 2024
ca37a49
survey: collect the set of requested refs
jeffhostetler Apr 29, 2024
0a20a17
survey: start pretty printing data in table form
derrickstolee Sep 1, 2024
91c4d57
survey: add object count summary
derrickstolee Sep 2, 2024
53632be
revision: create mark_trees_uninteresting_dense()
derrickstolee Sep 6, 2024
c63928e
survey: summarize total sizes by object type
derrickstolee Sep 2, 2024
3e9b671
path-walk: add prune_all_uninteresting option
derrickstolee Sep 4, 2024
af7d53f
survey: show progress during object walk
derrickstolee Sep 2, 2024
d192ae7
pack-objects: add --path-walk option
derrickstolee Sep 5, 2024
5f7e131
survey: add ability to track prioritized lists
derrickstolee Sep 2, 2024
ab0bc08
pack-objects: extract should_attempt_deltas()
derrickstolee Sep 6, 2024
bd8b5b5
survey: add report of "largest" paths
derrickstolee Sep 2, 2024
c6d4832
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
derrickstolee Sep 6, 2024
c2092f0
p5313: add size comparison test
derrickstolee Aug 28, 2024
bbc57f7
repack: add --path-walk option
derrickstolee Sep 5, 2024
32fca07
pack-objects: enable --path-walk via config
derrickstolee Sep 5, 2024
c145b9e
pack-objects: add --full-name-hash option
derrickstolee Sep 7, 2024
72191a0
test-name-hash: add helper to compute name-hash functions
derrickstolee Sep 8, 2024
5039f03
p5314: add a size test for name-hash collisions
derrickstolee Sep 9, 2024
e43582c
scalar: enable path-walk during push via config
derrickstolee Sep 5, 2024
88fee5b
pack-objects: output debug info about deltas
derrickstolee Aug 28, 2024
d17e503
Merge branch 'backfill'
dscho Sep 15, 2024
d7e7283
Merge branch 'survey'
dscho Sep 15, 2024
98a5786
Merge branch 'pack-path-walk'
dscho Sep 15, 2024
9d0690a
Merge branch 'path-walk'
dscho Sep 15, 2024
556335a
fixup! survey: collect the set of requested refs
dscho Sep 15, 2024
69aa8d8
fixup! pack-objects: output debug info about deltas
dscho Sep 15, 2024
5001883
fixup! survey: summarize total sizes by object type
dscho Sep 15, 2024
3ab1bda
fixup! survey: add report of "largest" paths
dscho Sep 15, 2024
84c8a06
fixup! survey: summarize total sizes by object type
dscho Sep 15, 2024
16cd9a3
fixup! pack-objects: output debug info about deltas
dscho Sep 15, 2024
c8f1239
fixup! survey: start pretty printing data in table form
dscho Sep 15, 2024
b5c2265
fixup! survey: add object count summary
dscho Sep 15, 2024
fee8f88
fixup! survey: summarize total sizes by object type
dscho Sep 15, 2024
489ce0c
test-tool: add the `path-walk` subcommand
dscho Sep 17, 2024
9b78d40
fixup! test-tool: add the `path-walk` subcommand
dscho Sep 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions path-walk.c
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include "revision.h"
#include "string-list.h"
#include "strmap.h"
#include "tag.h"
#include "trace2.h"
#include "tree.h"
#include "tree-walk.h"
Expand Down Expand Up @@ -215,6 +216,9 @@ int walk_objects_by_path(struct path_walk_info *info)
.paths_to_lists = STRMAP_INIT
};

struct oid_array tagged_tree_list = OID_ARRAY_INIT;
struct oid_array tagged_blob_list = OID_ARRAY_INIT;

trace2_region_enter("path-walk", "commit-walk", info->revs->repo);

CALLOC_ARRAY(commit_list, 1);
Expand Down Expand Up @@ -260,6 +264,60 @@ int walk_objects_by_path(struct path_walk_info *info)
oid_array_clear(&commit_list->oids);
free(commit_list);

if (info->tags) {
struct oid_array tags = OID_ARRAY_INIT;

trace2_region_enter("path-walk", "tag-walk", info->revs->repo);

/*
* Walk any pending objects at this point, but they should only
* be tags.
*/
for (size_t i = 0; i < info->revs->pending.nr; i++) {
Comment on lines +319 to +323
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that we have to handle tags in revs->pending first, accumulating the list before walking the commits, but then only processing them afterward (to avoid double-accounting for blobs that are both tagged directly and that are reachable via a tree).

I've just added a test helper in 489ce0c and used the left-over of t8100-*.sh -d:

$ ./bin-wrappers/test-tool -C ./t/trash\ directory.t8100-git-survey/ path-walk --no-blobs --tags --no-commits -v four
visiting path 'initial' (count: 0) of type 'tag'

$ git -C ./t/trash\ directory.t8100-git-survey/ rev-parse four four^0
dcd3534d11452b820a49532b41e7494f3ddfa2a8
5e53728740668d567dad45ca14d258b8f75fb7c1

$ ./bin-wrappers/test-tool -C ./t/trash\ directory.t8100-git-survey/ path-walk --no-blobs --tags --commits -v four
visiting path 'initial' (count: 7) of type 'commit'
        5e53728740668d567dad45ca14d258b8f75fb7c1
        6cf975cad5d4c3afe9d9c1beb519506ab26f2968
        5e51a08864ae54b185d9ccb441e3b84272a1d962
        bd080f6764f92d2e5f3c14d9ab1241495b919820
        123d744603d8c13997f7dd86b4f066cc68305762
        446a0a30049ffec3f6a1af16c07e8c0941b11660
        d8452bb98e031b9f6d1e3804a8cecc14e3124c9e
visiting path 'initial' (count: 0) of type 'tag'

As you can see, there is a tag object four but it is not reported because the revision walk removes it from the pending list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tag stuff is the weakest part of the path-walk API and demonstrates the need for a git rev-list-like test helper that checks which objects are collected for each path and object type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the test helper I implemented is a good start?

struct object_array_entry *pending = info->revs->pending.objects + i;
struct object *obj = pending->item;

while (obj->type == OBJ_TAG) {
struct tag *tag = lookup_tag(info->revs->repo,
&obj->oid);
oid_array_append(&tags, &obj->oid);
obj = tag->tagged;
}

switch (obj->type) {
case OBJ_TREE:
oid_array_append(&tagged_tree_list, &obj->oid);
break;

case OBJ_BLOB:
oid_array_append(&tagged_blob_list, &obj->oid);
break;

case OBJ_COMMIT:
/* skip */
break;

default:
BUG("should not see any other type here");
}
}

info->path_fn("initial", &tags, OBJ_TAG, info->path_fn_data);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably say "tags" instead of "initial"? And likewise the other path_fn("initial", ...) call may want to say "commits" instead?

And maybe even "tags" is misleading, because there could be blobs tracked under the file name tags? It is a valid path name, after all. We could play games by using a path that is invalid, such as .git/refs/tags (but what to do for commits? .git/refs/heads is wrong, they are not necessarily branch tips). Maybe better just use "".

The same goes for tagged-trees and tagged-blobs, too.

Also, do we want to prevent the function from being called with zero tags in case the caller asked for tags to be included but there were none to be found?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "initial" string comes from the way that this is emitted in the other object walk. I'll try to find pointers to that or reasoning for that to be important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means that you have to look at the object type first in the callback, and only interpret the path parameter as actual path in the OBJ_BLOB case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we want the path to matter for trees, too. When considering deltas in the pack-objects case, these paths don't matter for commits or tags because everything is sorted by type before compared by name-hash, so the set of deltas that are considered are the same in each case.


if (tagged_tree_list.nr)
info->path_fn("tagged-trees", &tagged_tree_list, OBJ_TREE,
info->path_fn_data);
if (tagged_blob_list.nr)
info->path_fn("tagged-blobs", &tagged_blob_list, OBJ_BLOB,
info->path_fn_data);

trace2_data_intmax("path-walk", ctx.repo, "tags", tags.nr);
trace2_region_leave("path-walk", "tag-walk", info->revs->repo);
oid_array_clear(&tags);
oid_array_clear(&tagged_tree_list);
oid_array_clear(&tagged_blob_list);
}

string_list_append(&ctx.path_stack, root_path);

trace2_region_enter("path-walk", "path-walk", info->revs->repo);
Expand Down
1 change: 1 addition & 0 deletions path-walk.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ struct path_walk_info {
int commits;
int trees;
int blobs;
int tags;

/**
* Specify a sparse-checkout definition to match our paths to. Do not
Expand Down