Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapse clean and non clean build requests on clean builders #266

Merged
merged 6 commits into from
Nov 12, 2024

Conversation

DavidSpickett
Copy link
Contributor

@DavidSpickett DavidSpickett commented Sep 24, 2024

Fixes #250

Where build requests were piling up on slower builders because they were clean/don't clean/clean/ don't clean etc. and we could not merge those requests because of that difference.

Galina suggested that if we knew that the builder was going to clean the build first anyway, we could merge them. This commit implements that.

We do this by first setting a "clean" attribute on the builder's factory object, that we can find later via in the collapse callback.

So far only ClangBuilder passes this on but any builder can opt in by copying what it does.

In the collapse function when comparing properties we delete the "clean_obj" key from both sets of properties if we know that the builder's factory is always clean.

With some logging (not included in this commit) we can see it now collapses a clean/non-clean pair of requests:

collapseRequests
selfBuildSetProperties: {'scheduler': ('main:clang,clang-tools-extra,compiler-rt,flang,lld,llvm,mlir', 'Scheduler')}
otherBuildSetProperties: {'scheduler': ('main:clang,clang-tools-extra,compiler-rt,flang,lld,llvm,mlir', 'Scheduler'), 'clean_obj': (True, 'Change')}
Removing clean_obj from build set properties.
Build set properties were the same.

This has beeen tested by my colleague Omair Javaid on a test build master. Of course we only tested it with one of Linaro's builders, so it's possible we see other side effects in production.

This is to address llvm#250.
Where build requests were piling up on slower builders because
they were clean/don't clean/clean/ don't clean etc. and we could
not merge those requests because of that difference.

Galina suggested that if we knew that the builder was going to
clean the build first anyway, we could merge them. This commit
implements that.

We do this by first setting a "clean" attribute on the builder's
factory object, that we can find later via in the collapse callback.

So far only ClangBuilder passes this on but any builder can opt
in by copying what it does.

In the collapse function when comparing properties we delete the
"clean_obj" key from both sets of properties if we know that the
builder's factory is always clean.

With some logging (not included in this commit) we can see it now
collapses a clean/non-clean pair of requests:
```
collapseRequests
selfBuildSetProperties: {'scheduler': ('main:clang,clang-tools-extra,compiler-rt,flang,lld,llvm,mlir', 'Scheduler')}
otherBuildSetProperties: {'scheduler': ('main:clang,clang-tools-extra,compiler-rt,flang,lld,llvm,mlir', 'Scheduler'), 'clean_obj': (True, 'Change')}
Removing clean_obj from build set properties.
Build set properties were the same.
```

This has beeen tested by my colleague Omair Javaid on a test build
master. Of course we only tested it with one of Linaro's builders,
so it's possible we see other side effects in production.
@joker-eph
Copy link
Contributor

Why would we prevent from collapsing clean/non-clean builds?

@DavidSpickett
Copy link
Contributor Author

We already merge:

unclean
unclean

And

clean
clean

As the properties matched.

Now we want to merge:

clean
unclean

Assuming unclean merges into clean, and the result is also clean, we could do this for any builder even if it does not clean by default. In current production we wouldn't because the properties don't match.

unclean
clean

This we can only merge if the builder itself would clean anyway, or if we are able to edit the request after merging to add clean_obj Currently it's just a boolean collapse or not.

I will see if I can edit the collapsed request, then this can apply globally not just to builders marked clean.

@DavidSpickett
Copy link
Contributor Author

DavidSpickett commented Sep 25, 2024

I'm going to see if it's feasible to modify the request that we collapse into, to address Mehdi's question.

And to be clear, Galina has the deciding approval here.

@DavidSpickett
Copy link
Contributor Author

I'm going to see if it's feasible to modify the request that we collapse into, to address Mehdi's question.

This is the buildbot function that calls the callback I'm modifying in this PR: https://github.com/buildbot/buildbot/blob/master/master/buildbot/process/buildrequest.py#L66

If we want to collapse unclean -< clean, we'd need to change this callback to return a new request with the clean_obj added. Or make the default some sort of superset of all options of both requests.

But I'd rather not get into changing Buildbot itself (or trying to monkeypatch it)

Why would we prevent from collapsing clean/non-clean builds?

@joker-eph tell me if I've answered this or not. Might have gone on a sidetrack there.

@DavidSpickett
Copy link
Contributor Author

DavidSpickett commented Sep 26, 2024

One sentence answer based on current buildbot behaviour:
"The result of a request collapse retains the properties of the request being collapsed into, if that request is unclean we would lose the clean property of the clean request that was collapsed."

That is unless we knew that the builder would clean anyway, which is what this PR checks for.

@DavidSpickett DavidSpickett requested a review from ceseo September 26, 2024 12:38
Copy link
Contributor

@slydiman slydiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,
But it seems it is necessary to add similar clean=clean to LLDBBuilder.py, PollyBuilder.py, OpenMPBuilder.py and BOLTBuilder.py. Note most builders use UnifiedTreeBuilder which already contains clean property. See the full list of factories with clean=True in llvm-zorg/buildbot/osuosl/master/config/builders.py

Copy link
Contributor

@slydiman slydiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value in getattr() is necessary to avoid an exception if some factory does not have the clean property.

@joker-eph
Copy link
Contributor

@joker-eph tell me if I've answered this or not. Might have gone on a sidetrack there.

Yes, this is exactly what I was looking for, thanks.

Copy link
Contributor

@slydiman slydiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Contributor

@gkistanova gkistanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, David!

The patch looks good with a small nit pick. Please see my comments inline.
In the mean time I staged the patch to see it in action.

zorg/buildbot/process/buildrequest.py Outdated Show resolved Hide resolved
zorg/buildbot/process/buildrequest.py Outdated Show resolved Hide resolved
@DavidSpickett
Copy link
Contributor Author

In the mean time I staged the patch to see it in action.

I've cleared the queue for one of our affected builders, https://lab.llvm.org/staging/#/builders/75, and will see how the queue develops.

Copy link
Contributor

@gkistanova gkistanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, David!
The patch works on the staging well.

vitalybuka added a commit that referenced this pull request Nov 4, 2024
@vitalybuka
Copy link
Contributor

Is this on staging?
just started bots on staging https://lab.llvm.org/staging/#/builders/7 and see
image

@vitalybuka
Copy link
Contributor

vitalybuka commented Nov 4, 2024

Is this on staging?

My apologies, I forgot to merge #257.

@vitalybuka
Copy link
Contributor

Thanks, David! The patch works on the staging well.

It does not look like it works. https://lab.llvm.org/staging/#/builders/7
Is this rely on staging?

@gkistanova
Copy link
Contributor

It does not look like it works. https://lab.llvm.org/staging/#/builders/7 Is this rely on staging?

You patch to make that builder clean was not on the staging yet. It is now.
I removed the current build queue, so we can see how it will go further on.

Let's wait for another day to make sure everything is fine. I'll be surprised if it is not, though.

@gkistanova
Copy link
Contributor

However, since sanitizer-buildbot9 is offline, no build requests is going to be scheduled. Is there a way to get that worker online on staging?

Copy link
Contributor

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitalybuka
Copy link
Contributor

Any estimates when it will reach lab.llvm.org/buildbot? Asking if we need to return workaround #295.

@gkistanova
Copy link
Contributor

I think this is ready to merge.

It could be in production on Friday late night or Staurday morning.

DavidSpickett added a commit to DavidSpickett/llvm-zorg that referenced this pull request Nov 12, 2024
To take advantage of llvm#266.
We use ccache on all our bots so this will not add much overhead.

The SVE bots are fast enough that the buildup of build requests is
not really a problem, but for consistency I've marked them clean
so all our 2 stage builds are clean.
@DavidSpickett
Copy link
Contributor Author

DavidSpickett commented Nov 12, 2024

Sorry I was away for a long weekend. Linaro's bots also seemed to have no problems with this, didn't break our non-clean builders either.

I will merge this now.

@DavidSpickett DavidSpickett merged commit e22dfe0 into llvm:main Nov 12, 2024
1 check passed
@DavidSpickett DavidSpickett deleted the collapse-pr branch November 12, 2024 12:04
DavidSpickett added a commit to DavidSpickett/llvm-zorg that referenced this pull request Nov 12, 2024
To take advantage of llvm#266.
We use ccache on all our bots so this will not add much overhead.

The SVE bots are fast enough that the buildup of build requests is
not really a problem, but for consistency I've marked them clean
so all our 2 stage builds are clean.
DavidSpickett added a commit that referenced this pull request Nov 14, 2024
To take advantage of #266. We use
ccache on all our bots so this will not add much overhead.

The SVE bots are fast enough that the buildup of build requests is not
really a problem, but for consistency I've marked them clean so all our
2 stage builds are clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build requests piling up on workers
7 participants