-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: update comments and tests for label counts. #1182
base: main
Are you sure you want to change the base?
Conversation
bq_io.add_labels(job_config, api_name=f"dataframe-to_{format.lower()}") | ||
|
||
# Note: Ensure no additional labels are added to job_config after this point, | ||
# as `add_and_trim_labels` ensures the label count does not exceed 64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repetition of this note at multiple places makes me think - can we not call start_query_with_client
instead of bqclient.query
from everywhere? That way any preprocessing (including labels) would be centralized and we would have only one note lilke this in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic after bqclient.query in start_query_with_client don't seems to match other functions, so this may not work.
We can make a new function that add labels and execute to replace bqclient.query, what do you think? This is my original thought but I'm not sure if it's necessary to add a new function just for two lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a great refactoring but better than leaving repetitive notes IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could make a hook in the ClientsProvider._create_bigquery_client
, something like:
...
original_query = bq_client.query
def query (query, *args, job_config=None, api_name=None, **kwargs):
add_and_trim_labels(job_config, api_name=api_name)
original_query(query, *args, job_config=job_config, **kwargs)
bq_client.query = query
...
return bq_client
@tswast would this be too hacky? What would be the most pythonic way to ensure labels are within the max limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would discourage us from monkey patching like this, though it should work in Python. I think your first instinct of using our central helper function makes the most sense to me.
I don't fully understand what you mean by
The logic after bqclient.query in start_query_with_client don't seems to match other functions, so this may not work.
We should be able to put start_query_with_client
in a try/except block. Also, it's probably a bug that we aren't showing that a query job is running here.
bq_io.add_labels(job_config) | ||
# Note: Ensure no additional labels are added to job_config after this point, | ||
# as `add_and_trim_labels` ensures the label count does not exceed 64. | ||
bq_io.add_and_trim_labels(job_config) | ||
query_job = self.bqclient.query(sql, job_config=job_config) | ||
_ = query_job.result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure result()
is a no-op for dry-run.
bq_io.add_labels(job_config) | ||
# Note: Ensure no additional labels are added to job_config after this point, | ||
# as `add_and_trim_labels` ensures the label count does not exceed 64. | ||
bq_io.add_and_trim_labels(job_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we actually create a job resource in the backend when we do a dry-run, so this might actually lose analytics. I think probably better not to call add_and_trim_labels
at all in this case.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕