-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-725] [CT-630] Sources are not passed into catalog.json #368
Comments
@hanna-liashchuk thank you for writing this up and providing details on your setup. I'm not exactly sure what's happening here as I cannot reproduce this behavior so I have a few follow up questions.
This is a bit of an educated guess without more information, but you may need to specify the schema if the source lives somewhere else and is poorly named. If the source does not live in the target database you can also define the database where it does live. |
Source's schema is already defined in source definition, or should I also create separate schema under If I open catalog.json after dbt docs generate, I cannot see my source, there is |
Well that all seems as expected. We will have to dig a bit more! Can you see the "catalog" query being run for the schema where your source tables live? To find the catalog query, check
So you should be able to find:
Once you let me know if you can find that we may know where to look next. |
Hi @emmyoop |
@hanna-liashchuk would you be willing to share you logs and artifacts (catalog.json & manifest.json) with me via email? Since I can't reproduce this behavior, it may help me determine what's happening. |
@emmyoop yes, sure |
Great! Just zip up the relevant log, manifest and catalog and send over to [email protected]. |
@emmyoop, the zip is sent :) |
@hanna-liashchuk Thanks for sharing those logs! So we can indeed see:
So it seems likely that either the column info is missing from that metadata query's result, or it's showing up in a text format that's slightly different from the one our regex is expecting. Could I ask you to try running these queries yourself, and share the results:
If we find that this is indeed due to missing or mismatched info in those statements, we'll want to transfer this issue over to the |
hi @jtcohen6
|
Indeed, it looks like the
Does not match the regex that INFORMATION_COLUMNS_REGEX = re.compile(r"^ \|-- (.*): (.*) \(nullable = (.*)\b", re.MULTILINE) We're expecting an output looking like this:
It looks like you're using a Delta table... though perhaps not a (I'm going to transfer this issue to the |
@jtcohen6 It's Spark 3.1.2 with Delta 1.0 |
opensource deltalake doesnt provide schema details in query We may need similar approach for docs generate as well. |
Also this dbt docs issue is already being discussed #295 |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Is there any chance of this issue being fixed? There is still no schema after generating docs and ingesting metadata into Datahub. And there is still no |
Hi, I'm working with dat-spark and I have a dbt project with source and a model that is using this source. I'm generating docs and ingesting metadata into Datahub. My issue is that schema for Source is not transferred.
Source definition:
Model definition:
dbt_project.yml
According to Datahub documentation, catalog.json should contain schema and I can see that it's generated without sources' schema, while manifest.json has this information. I tried to add sources into catalog.json manually and metadata were ingested correctly.
The text was updated successfully, but these errors were encountered: