Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Ability to Distinguish between Table and Views #8382

Open
1 task done
sarsasid opened this issue Feb 16, 2024 · 16 comments · May be fixed by #9832
Open
1 task done

feat: Ability to Distinguish between Table and Views #8382

sarsasid opened this issue Feb 16, 2024 · 16 comments · May be fixed by #9832
Labels
ddl Issues related to creating or altering data definitions feature Features or general enhancements

Comments

@sarsasid
Copy link

sarsasid commented Feb 16, 2024

Is your feature request related to a problem?

I see that we have an option to find Tables using

con.list_tables() method, but in many DBs Views are also considered as tables, as in INFORMATION_SCHEMA most DB's list Views as well

If there is a way to add a param to backend to remove Views that would be great

Describe the solution you'd like

Add a default param table_types=[] to con.list_tables() Function

Suggesting as table_types, as some DB will have different other types as well and will be flexible for users to specify table type

In all DB's logic add a condition,
if table_types:

  • SQL Server, add filter table_type IN ( ','.join(table_type) )
  • Postgres, add filter table_type IN ( ','.join(table_type) )
  • BigQuery, add filter table_type IN ( ','.join(table_type) )
    etc

What version of ibis are you running?

8.0.0

What backend(s) are you using, if any?

SQL Server

Code of Conduct

  • I agree to follow this project's Code of Conduct
@sarsasid sarsasid added the feature Features or general enhancements label Feb 16, 2024
@jcrist jcrist added the ddl Issues related to creating or altering data definitions label Feb 26, 2024
@mfatihaktas mfatihaktas self-assigned this Mar 28, 2024
@mfatihaktas mfatihaktas moved this from backlog to todo in Ibis planning and roadmap Mar 28, 2024
@mfatihaktas mfatihaktas moved this from todo to cooking in Ibis planning and roadmap Apr 3, 2024
@mfatihaktas mfatihaktas moved this from cooking to review in Ibis planning and roadmap Apr 18, 2024
@gforsyth
Copy link
Member

This is an exceptionally hairy problem for something that seems so simple.

Current API

The goal of the existing API is to treat everything tabular as an abstract table, so tables, views, materialized views, etc, are all "tables" in this context. So you list_tables(), you get all the abstract tables in your particular catalog.database namespace, so tables, views, materialized views, temp tables, etc.

If we separate into list_views and list_tables

First, we break a bunch of stuff, but more importantly, I think this is bad for usability.

Consider the following if we have separate list_views and list_tables:

>>> import ibis

>>> con = ibis.duckdb.connect()
>>> con.read_parquet("some_file.parquet")
>>> con.list_tables()
[]

For a new user, especially, I think this would be suboptimal. We're now requiring them to understand implementation details specific to the backend they are using. You need to know that DuckDB creates a view so that it can do lazy-loading and pushdowns into the parquet file. Depending on the backend, it might also get loaded in as a temporary table, so now you have to know which engine you are on so you know which command to run to check whether the parquet file has been loaded in as expected.

Propoal: ddl accessor

Conceptually, the distinction between and view and a table is entirely at the DDL-level, so let's treat it that way.

We leave list_tables alone, it still returns a list of tabular things.

We can add an accessor called ddl (or whatever), and expose methods like con.ddl.list_views() and con.ddl.list_materialized_views(), and whatever other DDL-specific things we want to filter on.

Then we can re-build the existing top-level list_tables on top of those more DDL-specific methods and avoid maintaining the same code in two-or-more places.

Users who need the more DDL-specific filters now have access to them, and users who don't care or need to know can continue using Ibis without considering engine-specific details.

@jcrist
Copy link
Member

jcrist commented Jul 23, 2024

I think the suggestion of separating out backend terminology (where "Table" is different from "view", "materialized view", ...) from ibis terminology (where everything tabular is an ibis.Table) via an accessor has a certain appeal. A few wrinkles:

  • Accessors are slightly unfriendly for discoverability, since the methods a user may be looking for won't show up directly on tab completion on con. This can of course be remedied by docs, just noting it as a rough edge.
  • Having con.list_tables and con.ddl.list_tables may be confusing for users (or us!), as they'd have different behaviors but the same name. At best, we'd have to clarify which list_tables method we mean when talking about them.
  • ddl is meaningful for users familiar with SQL, but not a great term otherwise. No strong thoughts on a better name though (sorry) - perhaps an acronym is the best we can do. It also means "data definition language", implying that the "definition" verbs should also be there (create table/view, drop table/view, ...)
  • It feels weird to me to relegate list_tables/list_views under an accessor, but keep the equivalent create_*/drop_* methods at the top level. The API design isn't symmetric. If we go the route of an accessor, the other DDL methods should probably move there as well IMO (with sufficient deprecation cycle).

@ncclementi
Copy link
Contributor

ncclementi commented Jul 23, 2024

ddl is meaningful for users familiar with SQL, but not a great term otherwise.

Sure, but the people that are not familiarized with SQL won't be looking at "views" for example, they'll treat everything as a table.

I'm not sure about the name either, but the idea is to just be able to list the views separately. If that's not a good name because of the implications behind its meaning, we should change it. But I'd be against of moving create table/view, drop table/view to the ddl accessor, because it'll make it more confusing for people not coming from SQL.

Having con.list_tables and con.ddl.list_tables may be confusing for users (or us!),

What I understood was that we will only have con.list_tables() (staying as is meaning thsi will have tables and views at least that's how duckdb works) and then adding con.ddl.list_views() for those that want to get just the views. But maybe I got it wrong.

It feels weird to me to relegate list_tables/list_views under an accessor...

I think we only want to move/create list_views under an accessor, but not touch list_tables.

@jcrist
Copy link
Member

jcrist commented Jul 23, 2024

I think we only want to move/create list_views under an accessor, but not touch list_tables.

I think that'd be even more confusing. That would mean that users wanting to see only the physical tables in a database (perhaps to enumerate and drop them all), they'd need to do something like:

actual_tables = set(con.list_tables(database="my-database")).difference(con.ddl.list_views(database="my-database"))

If we're going to add an accessor that lets you explicitly list out each of the tabular things in a backend, I think we'll want a way to do this for all the different kinds of things, not everything except tables.

@jcrist
Copy link
Member

jcrist commented Jul 23, 2024

An alternate proposal:

  • Repurpose the existing tables accessor to a method that returns all tabular-things. This would take the place of the current list_tables method. We can keep the accessor features of it as well.
  • Change con.list_tables to only list tables. This would necessarily have to be a breaking change since there's no real way to do this otherwise.

This keeps everything at the top level (nice for discoverability, and no need to name a ddl accessor), and lets us keep the symmetry of con.list_*/con.create_*/con.drop_* for all tabular things. The downside is it requires a breaking change.

(to be clear, I'm just talking through issues I see with the ddl accessor option - I'm not 100% against it, just want us to talk through the issues)

@ncclementi
Copy link
Contributor

ncclementi commented Jul 23, 2024

(in reference to the previous to last comment )If that's that case then I think the accessor complicate things more for the user.

The current open PR separates physical tables from views, without an accessor. But it sounds like the name list() to list both physical tables and views is what it was not convincing, and started the accessor conversation. Maybe just thinking of a better name instead of con.list() ? although I can't think of one.

@ncclementi ncclementi removed their assignment Jul 25, 2024
@gforsyth
Copy link
Member

  • Accessors are slightly unfriendly for discoverability, since the methods a user may be looking for won't show up directly on tab completion on con. This can of course be remedied by docs, just noting it as a rough edge.

Agreed that it's a rough edge, but I think of this as a thing more advanced users might want, and they will 🤞 read docs to find out how to do Thing™

  • Having con.list_tables and con.ddl.list_tables may be confusing for users (or us!), as they'd have different behaviors but the same name. At best, we'd have to clarify which list_tables method we mean when talking about them.

It might be confusing for users, but I think it's an acceptable trade-off. I don't want to expose new users to needing to think about engine-specific details.

  • ddl is meaningful for users familiar with SQL, but not a great term otherwise. No strong thoughts on a better name though (sorry) - perhaps an acronym is the best we can do. It also means "data definition language", implying that the "definition" verbs should also be there (create table/view, drop table/view, ...)

I think that's why ddl is maybe the right choice -- for users who would think to look for the thing it's a good signal, but otherwise doesn't really stand out.

  • It feels weird to me to relegate list_tables/list_views under an accessor, but keep the equivalent create_*/drop_* methods at the top level. The API design isn't symmetric. If we go the route of an accessor, the other DDL methods should probably move there as well IMO (with sufficient deprecation cycle).

Yeah, I think that's fine - it would clean up the top-level namespace.

I do not like list() as a method name, and I think having con.tables be the "default" thing users reach for is unlikely. They'll see a function called "list_tables" and try that.

@jcrist
Copy link
Member

jcrist commented Jul 26, 2024

I do not like list() as a method name, and I think having con.tables be the "default" thing users reach for is unlikely. They'll see a function called "list_tables" and try that.

That's fair.

Could we deprecate con.list_tables in favor of con.tables AND do the ddl accessor thing? In that case:

  • Backend specific terminology always lives in the ddl accessor.
  • There's no conflicting method names - con.tables lists ibis.Table objects, con.ddl.list_tables lists whatever the backend calls a "table".

If we go the route of deprecating con.create_table in favor of con.ddl.create_table, then for a while there will be a con.create_table that's a thin shim over con.ddl.create_table, BUT con.list_tables wouldn't be a thin shim over con.ddl.list_tables. I worry about users developing incorrect mental models here - avoiding calling APIs that do different things the same name would be helpful here IMO.

@gforsyth
Copy link
Member

Could we deprecate con.list_tables in favor of con.tables AND do the ddl accessor thing? In that case:

* Backend specific terminology always lives in the `ddl` accessor.

* There's no conflicting method names - `con.tables` lists `ibis.Table` objects, `con.ddl.list_tables` lists whatever the backend calls a "table".

I think I like this. It'll be a big change for long-time users, but I think a shorter method (property) name is nice.

To summarize the target state:

con.tables: shows all abstract table objects -- the repr lists all of those objects, and then a user can further tab-complete from con.tables.<tab> to pull out a table object

con.ddl contains all methods that create, list, or destroy all backend-specific objects

@jcrist
Copy link
Member

jcrist commented Jul 26, 2024

Yay, I agree this is a good end state. One small addition, I think con.tables should (or at least could) also be modified to be callable as a method, returning the same results that con.list_tables currently does.

@cpcloud
Copy link
Member

cpcloud commented Jul 26, 2024

I thought it was a mapping already, so can't you call list(con.tables) and/or iterate over it?

@jcrist
Copy link
Member

jcrist commented Jul 26, 2024

You can, but I don't see a reason not to make it callable as well to mirror the existing con.list_tables API 🤷. Not 100% for this, just tossing it out as an idea for something that might make sense to do as part of the transition.

@cpcloud
Copy link
Member

cpcloud commented Jul 26, 2024

I guess I'm maybe ... -0.25 on that only because we'd then have at least 3 ways to get the list of tables:

  • con.tables()
  • con.list_tables()
  • list(con.tables)

seems like the latter two are probably sufficient.

@gforsyth
Copy link
Member

well, in the long run con.list_tables() would go away

@ncclementi
Copy link
Contributor

ncclementi commented Aug 5, 2024

Based on what's discussed here and after chatting with @gforsyth

End Goal

  • con.ddl. (backend dependent)

    • list_tables: only physical tables
    • list_temp_tables
    • list_views: list of views
    • list_temp_views
    • list_materialized_views: when applies
    • create table/view/temp_table/temp_view (keep flag temp=bool re-evaluate need of create_temp_table separate method)
    • drop table /view
    • create/remove/ list database
    • create/remove/list catalog
  • deprecate 10 and remove after
    con.create/remove/list ...

  • con.tables() in the future will behave like con.list_tables() now. Meaning that con.tables() display a list of table and views, support like=, database=

@ncclementi
Copy link
Contributor

Given that we are planning on having a create temporary view, seems appropriate to tackle this one first #8727 before continuing with the accessor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ddl Issues related to creating or altering data definitions feature Features or general enhancements
Projects
Status: cooking
Development

Successfully merging a pull request may close this issue.

6 participants