Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(sparksql): Speed up sparksql compilation by splitting function registrations #11565

Closed

Conversation

Yohahaha
Copy link
Contributor

@Yohahaha Yohahaha commented Nov 18, 2024

This PR aims to speed up sparksql compilation by splitting function
registrations to multiple source files arranged according to function type.
Adds 'velox_functions_spark' for registrations and renames previous
'velox_functions_spark' as 'velox_functions_spark_impl'.

Tested the compilation time using velox_functions_spark_test target to mock
the general development process: build -> modify cpp file -> build. The
compilation time speeds up 1.5x(165s to 104s) in release mode and more in debug
mode.

Fixes #11564.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 18, 2024
Copy link

netlify bot commented Nov 18, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit fe0df08
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67491f5054264f0008d367d2

@Yohahaha Yohahaha changed the title Split sparksql registration code from velox_functions_spark refactor(sparksql): Split sparksql registration code from velox_functions_spark Nov 18, 2024
@Yohahaha
Copy link
Contributor Author

@PHILO-HE @rui-mo @mbasmanova would you help review this PR? it's help speed up sparksql compilation time.

@Yohahaha Yohahaha changed the title refactor(sparksql): Split sparksql registration code from velox_functions_spark refactor(sparksql): Speed up sparksql compilation times by split sparksql registration code into serval cpp files Nov 19, 2024
@Yohahaha
Copy link
Contributor Author

@FelixYBW @majetideepak @pedroerp would you help review this PR? it's help speed up sparksql complication times.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added several comments.

velox/functions/sparksql/Bitwise.cpp Show resolved Hide resolved
velox/functions/sparksql/JsonObjectKeys.h Show resolved Hide resolved
velox/functions/sparksql/registration/Register.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/registration/Register.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/registration/RegisterCompare.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/registration/RegisterDatetime.cpp Outdated Show resolved Hide resolved
@Yohahaha Yohahaha force-pushed the split-sparkfunc-register branch from 4d135d3 to e2b4294 Compare November 21, 2024 02:09
@Yohahaha
Copy link
Contributor Author

@rui-mo would you help review again?

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Added several comments.

@Yohahaha Yohahaha force-pushed the split-sparkfunc-register branch from e2b4294 to d60c91d Compare November 25, 2024 06:27
@Yohahaha
Copy link
Contributor Author

@rui-mo @pedroerp @majetideepak @Yuhta would you help review again? This PR is very helpful for improving the sparksql compilation speed.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added several minors.

velox/functions/sparksql/registration/Register.cpp Outdated Show resolved Hide resolved
@@ -20,7 +20,7 @@

namespace facebook::velox::functions::sparksql {

void registerFunctions(const std::string& prefix);
void registerFunctions(const std::string& prefix = "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the default value added, callers with empty string parameter like sparksql::registerFunctions("") can be simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can fix it in another PR if needed, many files will be affected.

velox/functions/sparksql/registration/RegisterMisc.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/registration/RegisterMisc.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/registration/Register.cpp Outdated Show resolved Hide resolved
@Yohahaha Yohahaha force-pushed the split-sparkfunc-register branch from fedf4a7 to 4826b23 Compare November 26, 2024 11:11
@rui-mo rui-mo changed the title refactor(sparksql): Speed up sparksql compilation times by split sparksql registration code into serval cpp files refactor(sparksql): Speed up sparksql compilation by splitting function registrations Nov 27, 2024
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

@rui-mo
Copy link
Collaborator

rui-mo commented Nov 27, 2024

Would you take a look if the workflow failures can be resolved after rebasing the code?

@Yohahaha Yohahaha force-pushed the split-sparkfunc-register branch from 15c540d to 2f0a81d Compare November 27, 2024 07:25
@Yohahaha
Copy link
Contributor Author

Would you take a look if the workflow failures can be resolved after rebasing the code?

#11677
#11676
#11675

@Yohahaha Yohahaha force-pushed the split-sparkfunc-register branch from e3d7952 to fe0df08 Compare November 29, 2024 01:56
Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the metrics this is the slowest to compile file in debug and on third place for release! https://facebookincubator.github.io/velox/bm-report/

I can see this improving dx when working on sparksql functions and reducing the need for rebuilds.

CMake looks good.

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add a comment explaining why this approach was taken.

@Yohahaha
Copy link
Contributor Author

Looking at the metrics this is the slowest to compile file in debug and on third place for release! https://facebookincubator.github.io/velox/bm-report/

I can see this improving dx when working on sparksql functions and reducing the need for rebuilds.

CMake looks good.

thanks for clarify the compilation time improvement works, I will try improve sparksql agg/window functions compilation time in following PR.

@rui-mo rui-mo added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Dec 2, 2024
@FelixYBW
Copy link

FelixYBW commented Dec 3, 2024

@xiaoxmeng Who can help to merge the PR? with the fix, spark functions' compile time can improve ~1.5x.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in 0bb7e64.

@Yohahaha Yohahaha deleted the split-sparkfunc-register branch December 3, 2024 10:10
@Yohahaha
Copy link
Contributor Author

Yohahaha commented Dec 3, 2024

@rui-mo @FelixYBW @xiaoxmeng thank for merging!

TongWei1105 pushed a commit to TongWei1105/velox that referenced this pull request Dec 3, 2024
…on registrations (facebookincubator#11565)

Summary:
This PR aims to speed up sparksql compilation by splitting function
registrations to multiple source files arranged according to function type.
Adds 'velox_functions_spark' for registrations and renames previous
'velox_functions_spark' as 'velox_functions_spark_impl'.

Tested the compilation time using `velox_functions_spark_test` target to mock
the general development process: build -> modify cpp file -> build. The
compilation time speeds up 1.5x(165s to 104s) in release mode and more in debug
mode.

Fixes facebookincubator#11564.

Pull Request resolved: facebookincubator#11565

Reviewed By: miaoever, kagamiori

Differential Revision: D66688101

Pulled By: xiaoxmeng

fbshipit-source-id: 54ba372f08c4ec91062b3d07e8e2b81aabbdef59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up sparksql target compilation times
5 participants