feat(repr): add `show_count` option to interactive repr #10518

NickCrews · 2024-11-21T15:51:16Z

Resolves #10231

Doesn't have tests yet. I want to get feedback on the semantics before I do that. I assume CI is going to fail all over the place because all the docstrings etc are going to change.

CHANGELOG

left-justified the dimensions per review suggestion

Currently looks like

import ibis

ibis.options.repr.interactive.max_rows = 3
ibis.options.repr.interactive.show_count
# True

t = ibis.memtable({"foo": range(1_000), "bar": range(1_000), "baz": range(1_000)})
t
# 3 cols by 1_000 rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo   ┃ bar   ┃ baz   ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │     0 │     0 │     0 │
# │     1 │     1 │     1 │
# │     2 │     2 │     2 │
# │     … │     … │     … │
# └───────┴───────┴───────┘
t.preview(show_count=False)
# 3 cols by … rows
# ┏━━━━━━━┳━━━━━━━┳━━━━━━━┓
# ┃ foo   ┃ bar   ┃ baz   ┃
# ┡━━━━━━━╇━━━━━━━╇━━━━━━━┩
# │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┤
# │     0 │     0 │     0 │
# │     1 │     1 │     1 │
# │     2 │     2 │     2 │
# │     … │     … │     … │
# └───────┴───────┴───────┘
t.foo
# 1_000
# rows
# ┏━━━━━━━┓
# ┃ foo   ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │     0 │
# │     1 │
# │     2 │
# │     … │
# └───────┘
# t.foo.preview(show_count=False)
# … rows
# ┏━━━━━━━┓
# ┃ foo   ┃
# ┡━━━━━━━┩
# │ int64 │
# ├───────┤
# │     0 │
# │     1 │
# │     2 │
# │     … │
# └───────┘

I made the config default be show_count=True. I can change this with pushback, but this is what I would start with. I would love to brainstorm a few benchmark test cases to run to see the perf difference. Some ideas:

ibis.duckdb.connect("mydb.db").table("my_table") (I expect ~0 difference)
ibis.duckdb.connect().read_parquet("t.pq") (I expect~0 difference)
ibis.duckdb.connect().read_csv("thousand_rows.csv") (I expect small difference)
ibis.duckdb.connect().read_csv("billion_rows.csv") (I expect large difference)
ibis.duckdb.connect().read_csv("thousand_rows.csv").some_expensive computation() (I expect a difference, size depends on semantics of the function)

I considered adding a .repr_options attribute to expressions as described in #10231 (comment), but I decided that was too complicated.

I considered showing the table name in the repr, eg with table.get_name(), but that is a separate question.

Currently, the option has the semantics of show_count: bool, and we ALWAYS show the column count. I considered other encodings such as show_shape: Literal["rows", "cols", "both", None], but I thought that was overkill.

I considered adding the row count to the bottom of the table, eg something like

┏━━━━━━━┓
┃ foo   ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
│     … │
│ 1_000 total rows│
└───────┘

but then if you set a high max_rows it would be hard to see. Plus then this info would need to be repeated in every column. Anyway, if you have other ideas on the graphic design of where to present the counts, I'm all ears.

lboller-pwbm · 2024-11-21T18:55:47Z

This looks fantastic!

I tried it out in a Jupyter kernel with a fairly wide table, which appears as a horizontally scrollable output using VSCode's interactive window. The dimensions appear but are center-justified, so the user has to scroll horizontally quite a bit to see them. I would suggest making it left-justified.

ibis/expr/types/pretty.py

gforsyth · 2024-11-22T16:07:36Z

ibis/config.py

@@ -76,6 +79,7 @@ class Interactive(Config):
    max_string: int = 80
    max_depth: int = 1
    show_types: bool = True
+    show_count: bool = True


I think this is a nice feature to have. I do not think it should default to true.

Interactive mode is supposed to be interactive, and something that can be computationally expensive isn't something we want to turn on by default.

The main failure case I can think of here is reading from CSVs in a blob store. With show_count=False, this can be done with a range request and only download the first N rows (ish), except if there's an order-by. show_count=True would mean any operation would trigger a download of the entire file.

And yes, that is one more reason not to use CSVs, but 🤷

OK, I am OK with defaulting to False. Let's just wait for Philip to chime in to verify, but I assume he is going to have the same opinion as you.

I really don't think it is worth the complexity/cleverness, but throwing it out there that we could make each relation have its own config, and that could be auto-determined based on if it's a CSV/parquet/physical table, local/remote, etc.

I assume he is going to have the same opinion as you.

We are slowly merging into the same person. The transformation is nearly complete.

I really don't think it is worth the complexity/cleverness, but throwing it out there that we could make each relation have its own config, and that could be auto-determined based on if it's a CSV/parquet/physical table, local/remote, etc.

I mean, it's definitely a reliably cheap(er) operation on anything that isn't a CSV, but I agree this seems like too much complexity. I think a better move is your suggestion of persistent default configuration

Yes, let's leave the default to False, otherwise it's a huge performance footgun/bug magnet.

cpcloud · 2024-12-17T11:49:06Z

ibis/expr/types/pretty.py

+    else:
+        nrows = "…"
+    if isinstance(tablish, ir.Table):
+        dims = f"{orig_ncols:_} cols by {nrows} rows"


Can we make this follow the long-standing convention of "rows by columns"?

refactor(repr): pass Interactive dataclass around, not raw values

7440d28

NickCrews force-pushed the interactive-count branch from 18eafb3 to 6dee4e6 Compare November 21, 2024 16:03

NickCrews mentioned this pull request Nov 21, 2024

feat: show df.count(), len(df.schema()) when displaying interactive = True tables #10231

Open

1 task

kdheepak reviewed Nov 22, 2024

View reviewed changes

ibis/expr/types/pretty.py Outdated Show resolved Hide resolved

feat(repr): show col and row count in rich repr

6868cd8

NickCrews force-pushed the interactive-count branch from 6dee4e6 to 6868cd8 Compare November 22, 2024 15:26

gforsyth reviewed Nov 22, 2024

View reviewed changes

cpcloud reviewed Dec 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(repr): add `show_count` option to interactive repr #10518

feat(repr): add `show_count` option to interactive repr #10518

NickCrews commented Nov 21, 2024 •

edited

Loading

lboller-pwbm commented Nov 21, 2024 •

edited

Loading

gforsyth Nov 22, 2024

NickCrews Nov 22, 2024

gforsyth Nov 22, 2024

cpcloud Dec 17, 2024

cpcloud Dec 17, 2024

feat(repr): add show_count option to interactive repr #10518

Are you sure you want to change the base?

feat(repr): add show_count option to interactive repr #10518

Conversation

NickCrews commented Nov 21, 2024 • edited Loading

CHANGELOG

lboller-pwbm commented Nov 21, 2024 • edited Loading

gforsyth Nov 22, 2024

Choose a reason for hiding this comment

NickCrews Nov 22, 2024

Choose a reason for hiding this comment

gforsyth Nov 22, 2024

Choose a reason for hiding this comment

cpcloud Dec 17, 2024

Choose a reason for hiding this comment

cpcloud Dec 17, 2024

Choose a reason for hiding this comment

feat(repr): add `show_count` option to interactive repr #10518

feat(repr): add `show_count` option to interactive repr #10518

NickCrews commented Nov 21, 2024 •

edited

Loading

lboller-pwbm commented Nov 21, 2024 •

edited

Loading