Developing PUDL as an application rather than a library #1669
Replies: 3 comments 10 replies
-
Do folks have feels about the proposed changes? Will those create problems for anyone that we're not thinking about? Are there any additional portions of the existing PUDL codebase that could/should be broken out into abstracted libraries / packages? Do the two mentioned above make sense? |
Beta Was this translation helpful? Give feedback.
-
Thinking about breaking out the metadata structures into their own package, which PUDL would depend on... The raw data archiving process also depends on the metadata to populate some of the metadata fields associated with the Zenodo archives. I wonder if it would make sense to have not only the metadata class infrastructure, but also the metadata definitions encapsulated as a separate package? Then the archivers could depend on the metdata package rather than all of everything in PUDL. The PUDL Catalog could also depend on the metadata, and be able to generate catalogs with rich metadata without needing to depend on everything in PUDL. |
Beta Was this translation helpful? Give feedback.
-
PUDL is a dependency of our current project, as we use several of the |
Beta Was this translation helpful? Give feedback.
-
Background
When we originally packaged up the PUDL repo for installation, we didn't appreciate the distinction between software libraries, frameworks, and applications, and also didn't really know which of those things PUDL was going to become. Initially we imagined that users would run the reproducible ETL pipeline locally and regenerate the outputs, and we followed the Python packaging norms that we encountered most frequently, which were designed around publishing libraries -- abstracted reusable code that could be put to use for a variety of different purposes, i.e. in different applications.
Is PUDL really a library?
It's become increasingly clear that this is not the right way to think about PUDL:
Or is PUDL an application?
Instead, PUDL seems more like an application: end-use software that does a particular job, in this case processing a bunch of specific datasets into a coherent whole.
Developing PUDL as an application
We've already started making some changes to PUDL to enable continuous deployment and the integration of much more data. Some of the patterns found in the Twelve-Factor App make sense in this context, and so do a lot of the abstractions provided by Dagster.
Proposed changes
Additional complementary changes that could make sense in light of treating PUDL like an application rather than a library:
pudl
andpudl-catalog
packages.Benefits
Breaking out reusable components?
There are also a few parts of the PUDL codebase that are potentially more reusable, and could (should?) be broken out into their own packages in separate repositories, and then used by PUDL (and potentially in some of our other projects). This doesn't have to be done now, but would move us gradually in the direction of separating the PUDL application from the more general tools which are used to create it.
Beta Was this translation helpful? Give feedback.
All reactions