datalake
Here are 249 public repositories matching this topic...
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
Updated
Dec 27, 2024 - Java
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
-
Updated
Dec 27, 2024 - Java
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
Updated
Dec 19, 2024 - Python
Postgres for Search and Analytics
-
Updated
Dec 27, 2024 - Rust
Upserts, Deletes And Incremental Processing on Big Data.
-
Updated
Dec 25, 2024 - Java
lakeFS - Data version control for your data lake | Git for data
-
Updated
Dec 27, 2024 - Go
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
-
Updated
Dec 25, 2024 - Java
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
-
Updated
Nov 25, 2024 - Java
The LeoFS Storage System
-
Updated
Jun 2, 2020 - Erlang
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
-
Updated
Dec 27, 2024 - Java
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
-
Updated
Dec 28, 2024 - Java
汇总Apache Hudi相关资料
-
Updated
Dec 22, 2024
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
-
Updated
Aug 21, 2024
DuckDB-powered analytics for Postgres
-
Updated
Dec 19, 2024 - Rust
Open Control Plane for Tables in Data Lakehouse
-
Updated
Dec 20, 2024 - Java
Use SQL to build ELT pipelines on a data lakehouse.
-
Updated
May 25, 2022 - JavaScript
A curated list of open source tools used in analytics platforms and data engineering ecosystem
-
Updated
Nov 6, 2024
Improve this page
Add a description, image, and links to the datalake topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the datalake topic, visit your repo's landing page and select "manage topics."