Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write support to pg_analytics #107

Open
philippemnoel opened this issue Aug 27, 2024 · 6 comments
Open

Add write support to pg_analytics #107

philippemnoel opened this issue Aug 27, 2024 · 6 comments
Labels
feature New feature or request good first issue Good for newcomers help wanted Extra attention is needed priority-high High priority issue user-request This issue was directly requested by a user

Comments

@philippemnoel
Copy link
Collaborator

What feature are you requesting?

This feature would enable users to write data to AWS S3, GCS and Azure Blob Storage. This would primarily be helpful for tiering data off to minimize cost. At first, we would only want to support the main file formats and object stores, not the open table formats like Delta Lake and Iceberg

Why are you requesting this feature?

Enable users to tier data off to AWS S3 and others easily

What is your proposed implementation for this feature?

Needs proper investigation. DuckDB has capabilities for this which we would need to expose properly.

Full Name:

Philippe Noël

Affiliation:

ParadeDB

@philippemnoel philippemnoel added feature New feature or request help wanted Extra attention is needed priority-medium Medium priority issue user-request This issue was directly requested by a user labels Aug 27, 2024
@rebasedming
Copy link
Contributor

rebasedming commented Sep 2, 2024

After an initial investigation, it looks like we can use DuckDB replacement scans, which allow you to register a custom callback to fire if DuckDB tries to read a table that doesn't exist in DuckDB.

So if the user tries to COPY a Postgres table to S3, we can

  1. Intercept it in the utility hook
  2. Register a replacement scan that tells DuckDB how to scan the Postgres table
  3. Have DuckDB execute the entire COPY statement

@Weijun-H
Copy link
Contributor

Weijun-H commented Sep 24, 2024

After the merge at duckdb/duckdb-rs#370, we can utilize the complete DuckDB C API to solve this ticket.

@philippemnoel
Copy link
Collaborator Author

After the merge at duckdb/duckdb-rs#370, we can utilize the complete DuckDB C API to solve this ticket.

Great find -- Here is the PR: duckdb/duckdb-rs#381

@philippemnoel
Copy link
Collaborator Author

@Weijun-H this just got merged, 5 days ago! This would be a really wonderful PR. Write support is our most requested feature.

@philippemnoel philippemnoel added good first issue Good for newcomers priority-high High priority issue and removed priority-medium Medium priority issue labels Oct 13, 2024
@Weijun-H
Copy link
Contributor

/take

@datasalaryman
Copy link

Really excited to see this feature. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request good first issue Good for newcomers help wanted Extra attention is needed priority-high High priority issue user-request This issue was directly requested by a user
Projects
None yet
Development

No branches or pull requests

4 participants