Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update comments and tests for label counts. #1182

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Genesis929
Copy link
Collaborator

@Genesis929 Genesis929 commented Nov 27, 2024

  1. Renamed add_labels to add_and_trim_labels.
  2. Added notes and docstring for future notification.
  3. Added tests for add_and_trim_labels and the functions that calls add_and_trim_labels to ensure the total number of labels won't exceed 64.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 27, 2024
@Genesis929 Genesis929 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 27, 2024
@Genesis929 Genesis929 removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 27, 2024
@Genesis929 Genesis929 marked this pull request as ready for review December 2, 2024 18:05
@Genesis929 Genesis929 requested review from a team as code owners December 2, 2024 18:05
@Genesis929 Genesis929 requested a review from tswast December 2, 2024 18:05
@Genesis929 Genesis929 added the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 2, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 2, 2024
@Genesis929 Genesis929 requested a review from shobsi December 2, 2024 18:06
bq_io.add_labels(job_config, api_name=f"dataframe-to_{format.lower()}")

# Note: Ensure no additional labels are added to job_config after this point,
# as `add_and_trim_labels` ensures the label count does not exceed 64.
Copy link
Contributor

@shobsi shobsi Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repetition of this note at multiple places makes me think - can we not call start_query_with_client instead of bqclient.query from everywhere? That way any preprocessing (including labels) would be centralized and we would have only one note lilke this in our code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic after bqclient.query in start_query_with_client don't seems to match other functions, so this may not work.

We can make a new function that add labels and execute to replace bqclient.query, what do you think? This is my original thought but I'm not sure if it's necessary to add a new function just for two lines of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a great refactoring but better than leaving repetitive notes IMHO.

Copy link
Contributor

@shobsi shobsi Dec 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could make a hook in the ClientsProvider._create_bigquery_client, something like:

...
original_query = bq_client.query

def query (query, *args, job_config=None, api_name=None, **kwargs):
    add_and_trim_labels(job_config, api_name=api_name)
    original_query(query, *args, job_config=job_config, **kwargs)

bq_client.query = query
...
return bq_client

@tswast would this be too hacky? What would be the most pythonic way to ensure labels are within the max limit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would discourage us from monkey patching like this, though it should work in Python. I think your first instinct of using our central helper function makes the most sense to me.

I don't fully understand what you mean by

The logic after bqclient.query in start_query_with_client don't seems to match other functions, so this may not work.

We should be able to put start_query_with_client in a try/except block. Also, it's probably a bug that we aren't showing that a query job is running here.

bq_io.add_labels(job_config)
# Note: Ensure no additional labels are added to job_config after this point,
# as `add_and_trim_labels` ensures the label count does not exceed 64.
bq_io.add_and_trim_labels(job_config)
query_job = self.bqclient.query(sql, job_config=job_config)
_ = query_job.result()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure result() is a no-op for dry-run.

bq_io.add_labels(job_config)
# Note: Ensure no additional labels are added to job_config after this point,
# as `add_and_trim_labels` ensures the label count does not exceed 64.
bq_io.add_and_trim_labels(job_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually create a job resource in the backend when we do a dry-run, so this might actually lose analytics. I think probably better not to call add_and_trim_labels at all in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants