-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas-gbq auth proposal #161
Comments
💯 , and then the library can manage Clients (i.e. this doesn't mean we'd need to create a new Client each request) |
Trim pydata-google-auth package and add tests This is the initial version of the proposed pydata-google-auth package (to be used by pandas-gbq and ibis). It includes two methods: * `pydata_google_auth.default()` * A function that does the same as pandas-gbq does auth currently. Tries `google.auth.default()` and then falls back to user credentials. * `pydata_google_auth.get_user_credentials()` * A public `get_user_credentials()` function, as proposed in googleapis/python-bigquery-pandas#161. Missing in this implementation is a more configurable way to adjust credentials caching. I currently use the `reauth` logic from pandas-gbq. I drop `try_credentials()`, as it makes less sense when this module might be used for other APIs besides BigQuery. Plus there were problems with `try_credentials()` even for pandas-gbq (googleapis/python-bigquery-pandas#202, googleapis/python-bigquery-pandas#198).
This was released as part of the pydata-google-auth package. Documented at https://pydata-google-auth.readthedocs.io/en/latest/api.html#pydata_google_auth.get_user_credentials |
@tswast glad this was released but what does this mean for pandas-gbq? I'm still having an issue with drive scopes and was hoping this could possibly solve it. Does this solve the issue in some way? |
@christianramsey I'm glad you asked. Yes, the combination of #231 and https://pydata-google-auth.readthedocs.io/en/latest/api.html#pydata_google_auth.get_user_credentials allows you to use drive scopes. I (or some helpful contributor 😃 ) need to
A briefly example of using the drive scope: Until pandas-gbq 0.8.0 is released, install from the latest on GitHub
Install pydata-google-auth
auth_example.py: import pandas_gbq
import pydata_google_auth
import pydata_google_auth.cache
# Instead of get_user_credentials(), you could do default(), but that may not
# be able to get the right scopes if running on GCE or using credentials from
# the gcloud command-line tool.
credentials = pydata_google_auth.get_user_credentials(
scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
# Use reauth to get new credentials if you haven't used the drive scope
# before. You only have to do this once.
credentials_cache=pydata_google_auth.cache.REAUTH,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as with Google Colab.
auth_local_webserver=True,
)
sql = """SELECT state_name
FROM `my_dataset.us_states_from_google_sheets`
WHERE post_abbr LIKE 'W%'
"""
df = pandas_gbq.read_gbq(
sql,
project_id='YOUR-PROJECT-ID',
credentials=credentials,
dialect='standard',
)
print(df) |
@christianramsey Actually, you can use import pandas
import pandas_gbq
import pydata_google_auth
import pydata_google_auth.cache
credentials = pydata_google_auth.get_user_credentials(
scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
)
# Update the in-memory credentials cache (added in pandas-gbq 0.7.0).
pandas_gbq.context.credentials = credentials
pandas_gbq.context.project = 'your-project-id'
sql = """SELECT state_name
FROM `my_dataset.us_states_from_google_sheets`
WHERE post_abbr LIKE 'W%'
"""
df = pandas_gbq.read_gbq(
sql,
dialect='standard',
)
print(df) |
The above code worked! xie xie @tswast |
It appears There is some logic that reads it, but then the value is never used. I guess I don't have to mark it deprecated since it was broken, anyway? For users that do want similar functionality: to choose the cache location with an environment variable, pydata/pydata-google-auth#7 tracks that feature request in pydata-google-auth. |
Overview
The current auth flows for pandas-gbq are a bit confusing and hard to customize.
Final desired state. The
pandas_gbq
module should have the following (changes in bold):Tasks:
read_gbq
, taking a google.cloud.bigquery.Client object.to_gbq
, taking a google.cloud.bigquery.Client object.private_key
argument. Show examples of how to do the same thing by passing Credentials to the Client constructor.pandas_gbq.get_user_credentials
withcredentials_cache
argument.* [ ] DeprecateEdit: No reason to deprecate reauth, since we don't need to complicate pandas-gbq's auth with pydata-google-auth's implementation details.reauth
argument. Show examples usingpandas_gbq.get_user_credentials
withcredentials_cache
argument and WriteOnlyCredentialsCache or NoopCredentialsCache.* [ ] DeprecateEdit: No reason to deprecate auth_local_webserver, as that feature is still needed. We don't actually want to force people to use pydata-google-auth for the default credentials case.auth_local_webserver
argument. Show example usingpandas_gbq.get_user_credentials
withauth_local_webserver
argument.Background
pandas-gbq has its own auth flows, which include but are distinct from "application default credentials".
See issue: #129
Current (0.4.0) state of pandas-gbq auth:
private_key
parameter. Parameter can be either as JSON bytes or a file path.~/.config/pandas_gbq/bigquery_credentials.dat
or in path specified byPANDAS_GBQ_CREDENTIALS_FILE
environment variable.Why does pandas-gbq do user auth at all? Aren't application default credentials enough?
Problems with the current flow
private_key
argument to a call in a test, resulting in surprising failures in CI builds.Proposal
Document default auth behavior
Current behavior (not changing, except for deprecations).
New default auth behavior.
Add
client
parameter toread_gbq
andto_gbq
The new client parameter, if provided, would bypass all other credentials fetching mechanisms.
Why a Client and not an explicit Credentials object?
Helpers for user-based authentication
No helpers are needed for default credentials or service account credentials because these can easily be constructed with the google-auth library. Link to samples for constructing these from the docs.
pandas_gbq.get_user_credentials(scopes=None, credentials_cache=None, client_secrets=None, use_localhost_webserver=False):
If credentials_cache is None, construct a pandas_gbq.CredentialsCache with defaults for arguments.
Attempt to load credentials from cache.
If credentials can't be loaded, start 3-legged oauth2 flow for installed applications. Use provided client secrets if given, otherwise use Pandas-GBQ client secrets. Use command-line flow by default. Use localhost webserver if set to True.
No credentials could be fetched? Raise an AccessDenied error. (Existing behavior of GbqConnector.get_user_account_credentials())
Save credentials to cache.
Return credentials.
pandas_gbq.CredentialsCache
Constructor takes optional credentials_path.
If credentials_path not provided, set self._credentials_path to
PANDAS_GBQ_CREDENTIALS_FILE - show deprecation warning that this environment variable will be ignored at a later date.
Default user credentials path
at~/.config/pandas_gbq/bigquery_credentials.dat
Methods
pandas_gbq.WriteOnlyCredentialsCache
Same as CredentialsCache, but load() is a no-op. Equivalent to "force reauth" in current versions.
pandas_gbq.NoopCredentialsCache
Satisfies the credentials cache interface, but does nothing. Useful for shared systems where you want credentials to stay in memory (e.g. Colab).
Deprecations
Some time should be given (1-year deprecation?) for folks to migrate to the new
client
argument. It might be used in scripts and older notebooks, and also is a parameter upstream in Pandas.Deprecate the PANDAS_GBQ_CREDENTIALS_FILE environment variable
Log a deprecation warning suggesting
pandas_gbq.get_user_credentials
with apandas_gbq.CredentialsCache
argument.Deprecate
private_key
argumentLog a deprecation warning suggesting google.oauth2.service_account.Credentials.from_service_account_info instead of passing in bytes and google.oauth2.service_account.Credentials.from_service_account_file instead of passing in a path.
Add / link to service account examples in the docs.
Deprecate
reauth
argumentLog a deprecation warning suggesting creating a client using credentials from pandas_gbq.get_user_credentials and a pandas_gbq.WriteOnlyCredentialsCache
Add user authentication examples in the docs.
Deprecate
auth_local_webserver
argumentLog a deprecation warning suggesting creating a client using credentials from pandas_gbq.get_user_credentials and set the auth_local_webserver argument there.
Add user authentication examples in the docs.
/cc @craigcitro @maxim-lian
The text was updated successfully, but these errors were encountered: