Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to_gbq and read_gbq to pandas-gbq 0.5.0 #21628

Merged
merged 1 commit into from
Jun 26, 2018

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Jun 25, 2018

Closes googleapis/python-bigquery-pandas#177

Closes #21627

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

I've also verified that the docs build and render well with

python doc/make.py --single read_gbq
python doc/make.py --single DataFrame.to_gbq

Output from scripts/validate_docstrings.py pandas.read_gbq:


################################################################################
######################### Docstring (pandas.read_gbq)  #########################
################################################################################

Load data from Google BigQuery.

This function requires the `pandas-gbq package
<https://pandas-gbq.readthedocs.io>`__.

See the `How to authenticate with Google BigQuery
<https://pandas-gbq.readthedocs.io/en/latest/howto/authentication.html>`__
guide for authentication instructions.

Parameters
----------
query : str
    SQL-Like Query to return data values.
project_id : str, optional
    Google BigQuery Account project ID. Optional when available from
    the environment.
index_col : str, optional
    Name of result column to use for index in results DataFrame.
col_order : list(str), optional
    List of BigQuery column names in the desired order for results
    DataFrame.
reauth : boolean, default False
    Force Google BigQuery to re-authenticate the user. This is useful
    if multiple accounts are used.
private_key : str, optional
    Service account private key in JSON format. Can be file path
    or string contents. This is useful for remote server
    authentication (eg. Jupyter/IPython notebook on remote host).
auth_local_webserver : boolean, default False
    Use the `local webserver flow`_ instead of the `console flow`_
    when getting user credentials.

    .. _local webserver flow:
        http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server
    .. _console flow:
        http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_console

    *New in version 0.2.0 of pandas-gbq*.
dialect : str, default 'legacy'
    SQL syntax dialect to use. Value can be one of:

    ``'legacy'``
        Use BigQuery's legacy SQL dialect. For more information see
        `BigQuery Legacy SQL Reference
        <https://cloud.google.com/bigquery/docs/reference/legacy-sql>`__.
    ``'standard'``
        Use BigQuery's standard SQL, which is
        compliant with the SQL 2011 standard. For more information
        see `BigQuery Standard SQL Reference
        <https://cloud.google.com/bigquery/docs/reference/standard-sql/>`__.
location : str, optional
    Location where the query job should run. See the `BigQuery locations
    documentation
    <https://cloud.google.com/bigquery/docs/dataset-locations>`__ for a
    list of available locations. The location must match that of any
    datasets used in the query.

    *New in version 0.5.0 of pandas-gbq*.
configuration : dict, optional
    Query config parameters for job processing.
    For example:

        configuration = {'query': {'useQueryCache': False}}

    For more information see `BigQuery REST API Reference
    <https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query>`__.
verbose : None, deprecated
    Deprecated in Pandas-GBQ 0.4.0. Use the `logging module
    to adjust verbosity instead
    <https://pandas-gbq.readthedocs.io/en/latest/intro.html#logging>`__.

Returns
-------
df: DataFrame
    DataFrame representing results of query.

See Also
--------
pandas_gbq.read_gbq : This function in the pandas-gbq library.
pandas.DataFrame.to_gbq : Write a DataFrame to Google BigQuery.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No examples section found

Output from scripts/validate_docstrings.py pandas.DataFrame.to_gbq:

################################################################################
##################### Docstring (pandas.DataFrame.to_gbq)  #####################
################################################################################

Write a DataFrame to a Google BigQuery table.

This function requires the `pandas-gbq package
<https://pandas-gbq.readthedocs.io>`__.

See the `How to authenticate with Google BigQuery
<https://pandas-gbq.readthedocs.io/en/latest/howto/authentication.html>`__
guide for authentication instructions.

Parameters
----------
destination_table : str
    Name of table to be written, in the form ``dataset.tablename``.
project_id : str, optional
    Google BigQuery Account project ID. Optional when available from
    the environment.
chunksize : int, optional
    Number of rows to be inserted in each chunk from the dataframe.
    Use ``None`` to load the dataframe in a single chunk.
reauth : bool, default False
    Force Google BigQuery to re-authenticate the user. This is useful
    if multiple accounts are used.
if_exists : str, default 'fail'
    Behavior when the destination table exists. Value can be one of:

    ``'fail'``
        If table exists, do nothing.
    ``'replace'``
        If table exists, drop it, recreate it, and insert data.
    ``'append'``
        If table exists, insert data. Create if does not exist.
private_key : str, optional
    Service account private key in JSON format. Can be file path
    or string contents. This is useful for remote server
    authentication (eg. Jupyter/IPython notebook on remote host).
auth_local_webserver : bool, default False
    Use the `local webserver flow`_ instead of the `console flow`_
    when getting user credentials.

    .. _local webserver flow:
        http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server
    .. _console flow:
        http://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_console

    *New in version 0.2.0 of pandas-gbq*.
table_schema : list of dicts, optional
    List of BigQuery table fields to which according DataFrame
    columns conform to, e.g. ``[{'name': 'col1', 'type':
    'STRING'},...]``. If schema is not provided, it will be
    generated according to dtypes of DataFrame columns. See
    BigQuery API documentation on available names of a field.

    *New in version 0.3.1 of pandas-gbq*.
location : str, optional
    Location where the load job should run. See the `BigQuery locations
    documentation
    <https://cloud.google.com/bigquery/docs/dataset-locations>`__ for a
    list of available locations. The location must match that of the
    target dataset.

    *New in version 0.5.0 of pandas-gbq*.
progress_bar : bool, default True
    Use the library `tqdm` to show the progress bar for the upload,
    chunk by chunk.

    *New in version 0.5.0 of pandas-gbq*.
verbose : bool, deprecated
    Deprecated in Pandas-GBQ 0.4.0. Use the `logging module
    to adjust verbosity instead
    <https://pandas-gbq.readthedocs.io/en/latest/intro.html#logging>`__.

See Also
--------
pandas_gbq.to_gbq : This function in the pandas-gbq library.
pandas.read_gbq : Read a DataFrame from Google BigQuery.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No returns section found
	No examples section found

tswast added a commit to tswast/python-bigquery-pandas that referenced this pull request Jun 25, 2018
The docs needs some corrections in order to pass the Pandas docs linter
in pandas-dev/pandas#21628
tswast added a commit to googleapis/python-bigquery-pandas that referenced this pull request Jun 25, 2018
The docs needs some corrections in order to pass the Pandas docs linter
in pandas-dev/pandas#21628
@codecov
Copy link

codecov bot commented Jun 25, 2018

Codecov Report

Merging #21628 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21628   +/-   ##
=======================================
  Coverage    91.9%    91.9%           
=======================================
  Files         154      154           
  Lines       49562    49562           
=======================================
  Hits        45549    45549           
  Misses       4013     4013
Flag Coverage Δ
#multiple 90.3% <100%> (ø) ⬆️
#single 41.85% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <100%> (ø) ⬆️
pandas/core/frame.py 97.19% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36422a8...7330463. Read the comment docs.

@tswast
Copy link
Contributor Author

tswast commented Jun 25, 2018

Travis failure appears to be unrelated to this change.

Use the `local webserver flow`_ instead of the `console flow`_
when getting user credentials.

.. _local webserver flow:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think these break linting, need a #noqa I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding # noqa breaks the link. Sphinx generates a link like pandas/doc/build/html/generated_single/pandas.read_gbq.html#noqahttp://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server when I add this.

I do not get any lint errors when I run git diff upstream/master -u -- "*.py" | flake8 --diff and scripts/validate_docstrings.py pandas.read_gbq also passes.

Name of table to be written, in the form 'dataset.tablename'.
project_id : str
Google BigQuery Account project ID.
Name of table to be written, in the form ``dataset.tablename``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think needs to be single backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two backticks is code font in Sphinx RST, which is what I want.

@@ -18,6 +18,8 @@ Other Enhancements
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`)
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`)
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`)
- :func:`to_gbq` and :func:`read_gbq` signature and documentation updated to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add here a link to the pandas-gbq docs whatsnew for 0.5.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Add link to Pandas-GBQ 0.5.0 in what's new.
* Remove unnecessary sleep in GBQ tests.

Closes googleapis/python-bigquery-pandas#177

Closes pandas-dev#21627
Copy link
Contributor Author

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with link to Pandas-GBQ changelog and rebase on master.

Name of table to be written, in the form 'dataset.tablename'.
project_id : str
Google BigQuery Account project ID.
Name of table to be written, in the form ``dataset.tablename``.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two backticks is code font in Sphinx RST, which is what I want.

@@ -18,6 +18,8 @@ Other Enhancements
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`)
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`)
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`)
- :func:`to_gbq` and :func:`read_gbq` signature and documentation updated to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the `local webserver flow`_ instead of the `console flow`_
when getting user credentials.

.. _local webserver flow:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding # noqa breaks the link. Sphinx generates a link like pandas/doc/build/html/generated_single/pandas.read_gbq.html#noqahttp://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server when I add this.

I do not get any lint errors when I run git diff upstream/master -u -- "*.py" | flake8 --diff and scripts/validate_docstrings.py pandas.read_gbq also passes.

@tswast
Copy link
Contributor Author

tswast commented Jun 26, 2018

Looks like Travis and Circle are both happy now that I rebased.

@jorisvandenbossche
Copy link
Member

I am not really familiar with the gbq code, but, did we consider making this just a passthrough of *args, **kwargs ? Then it would automatically work with the installed pandas_gbq

@tswast
Copy link
Contributor Author

tswast commented Jun 26, 2018

@jorisvandenbossche I considered (and even coded) that in #20564

The development experience is nicer (IDEs understand them better) without using **kwargs. It is somewhat rare to add new arguments to Pandas-GBQ, though I do have a couple changes in mind to streamline auth googleapis/python-bigquery-pandas#161.

The location field is special because it's a new field on the BigQuery API as opposed to a new field in the BigQuery API resources.

@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

i think writing out the kwargs is nice from a usability pov - we do this for other apis as well

@jorisvandenbossche
Copy link
Member

Yep, no problem, was just wondering!

@jreback jreback added this to the 0.24.0 milestone Jun 26, 2018
@jreback jreback merged commit 2a7e1f2 into pandas-dev:master Jun 26, 2018
@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

thanks @tswast !

@alimcmaster1 alimcmaster1 mentioned this pull request Aug 28, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
* Add link to Pandas-GBQ 0.5.0 in what's new.
* Remove unnecessary sleep in GBQ tests.

Closes googleapis/python-bigquery-pandas#177

Closes pandas-dev#21627
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants