FAQs about DataHub [drafts for comment] #1166
Unanswered
rufuspollock
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
How does DataHub relate to CKAN?
DataHub can be used as a complement to CKAN. Specifically, DataHub PortalJS allows you to rapidly building powerful data portal frontends and integrates directly with CKAN. It is inspired by the "headless" CMS model and uses CKAN as a headless DMS. It can be used to create a alternative frontend to CKAN using modern frontend tooling and integrate alternative content sources.
DataHub can also be used as a full replacement for CKAN if you want to manage your metadata and content differently than CKAN. DataHub offers its own native metadata management but it also provides an innovative integration with GitHub as well as connectors to other metadata management solutions (see XX for a full list).
DataHub's real power is it allows you to mix and match especially in the backend. Modern data portals often want to create an integrated experience with content and data woven seamlessly together. With DataHub you can have quickly and flexibly create a unified data portal experience that combines data in CKAN, docs in markdown in github, and blog and content in Wordpress.
Finally, DataHub Cloud (Enterprise) provides all of this as SaaS at extremely attractive price point (due to our experience and scale and the features of the underlying DataHub framework)
Why would I use DataHub if i have CKAN?
You can use the parts of DataHub framework that complement CKAN. For example:
How do DataHub and Git(Hub) go together?
DataHub can use many different data "backends" i.e. sources for data and metadata. For example, you can use CKAN, S3 etc. However, Git(hub) is our favorite and default backend.
NB: GitHub on its own has issues with large files so this is with the caveat of a good solution for large files and GitHub e.g. git lfs or git lfs with your own storage (something DataHub has a solution for!)
What is a backend exactly: In its essence a data portal is about presenting data. That data, and its metadata, is stored somewhere. By backend we mean the data storage and data catalog that hold those. Could be files on disk, could be cloud storage, could be combination of dedicated data catalog and storage.
[Draft] More generally, data infrastructure (? dms?) is about managing data i.e.storing, processing, presenting, discovering etc.
Why Git(hub)? As Data Engineers and Data Scientists ourselves, we love using Git and Github (or Gitlab) for managing source code. We've also pioneered (since late 2000s!) using Git(hub) for managing data. BUT ... GitHub has a lot of limitations as a "DataHub" - most obviously it's about presenting and managing source code, not data. At the same time, reinventing Git isn't a great idea (though people keep trying!). That's why we build DataHub "on top of" git, so we could get the best of both worlds: all the power of git, but with the presentation and management we want for data.
Can I get a hosted edition of DataHub PortalJS?
Yes, use DataHub Cloud Enterprise https://datahub.io/enterprise
Can I get a hosted edition of DataHub?
Yes, use DataHub Cloud or DataHub Cloud Enterprise
Beta Was this translation helpful? Give feedback.
All reactions