Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache virtual object membership #586

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jcoyne
Copy link
Contributor

@jcoyne jcoyne commented Mar 30, 2023

This is required for adding relatedItem entries on the public xml. Fixes #585

@jcoyne jcoyne force-pushed the cache-virtual-object-membership branch from 5dc768a to 1f0c931 Compare March 30, 2023 21:12
@@ -1,7 +1,7 @@
FactoryBot.define do
factory :purl do
sequence :druid do |n|
"druid:zz#{n.to_s * 3}yy#{n.to_s * 4}"
"druid:zz#{"%03d" % rand(1000)}yy#{"%04d" % rand(10000)}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sequence approach stopped working when we had more than 10 uses of this factory.

@jcoyne jcoyne force-pushed the cache-virtual-object-membership branch from 1f0c931 to bcc7512 Compare March 30, 2023 21:18
@cbeer
Copy link
Member

cbeer commented Mar 31, 2023

I think I don't understand this architecturally. Maintaining a parallel database of COCINA-derived data seems unfortunate. It still feels like SDR could send us everything we need to know up-front?

@cbeer
Copy link
Member

cbeer commented Mar 31, 2023

And, not just unfortunate, but doesn't this rely on us maintaining an accurate database? Async or parallel operations seem like we could get some inconsistent behavior.

It feels like it'd be nicer if infra maintained the only source of truth and hand over all the data we need in their original request.

@jcoyne
Copy link
Contributor Author

jcoyne commented Apr 4, 2023

It feels like it'd be nicer if infra maintained the only source of truth and hand over all the data we need in their original request.

They do hand over all the data. We just need a way to look at this relationship inversely, just the same as we do with collection membership. This is just a cache of the data.

@jcoyne jcoyne force-pushed the cache-virtual-object-membership branch from bcc7512 to 892530d Compare April 4, 2023 13:11
This is required for adding relatedItem entries on the public xml.
Fixes #585
@jcoyne jcoyne force-pushed the cache-virtual-object-membership branch from 892530d to 6256f2a Compare April 4, 2023 13:54
@cbeer
Copy link
Member

cbeer commented Apr 4, 2023

Maybe I don't understand what we're planning to use this data for. It seems to me, if it's data we need internally to construct the public xml, SDR should provide that to us to avoid any hint of race conditions, non-idempotent behavior, or need for synchronous processing.

@jcoyne
Copy link
Contributor Author

jcoyne commented Apr 4, 2023

@cbeer it will be used to construct the virtual object relationships (e.g. fedora:isConstituentOf in public xml) as seen here: https://github.com/sul-dlss/dor-services-app/blob/main/app/services/published_relationships_filter.rb#L16

At present the infrastructure team is looking this up in the Argo solr (which they are trying to decouple from).
Currently there is no way in the cocina to pass this data, because maintaining bidirectional relationships in flat files is a consistency challenge. In this PR the access team takes on the responsibility for having an index of this data, just as we do for collection membership data. Instead, if we're to require SDR to send this to us, we need to create a new data model that purl-fetcher can use as it's API. This option seems like a bigger project than we can tackle in the dwindling time we have in this maintenance cycle.

@cbeer
Copy link
Member

cbeer commented Apr 4, 2023

In this PR the access team takes on the responsibility for having an index of this data, just as we do for collection membership data

The difference with the collection membership data is that we're not trying to consume that synchronously.

Without infra handing us the data, won't we need to process a virtual object and the constituents in a well-defined order (make sure we've processed all the constituents before the virtual object?) or have some logic to update the virtual object whenever a constituent changes? That seems like at least as much of a pain.

I guess I don't know enough about how the Cocina is managed on the infra side; I guess I assumed there was just a big database they could ask?

@jcoyne jcoyne marked this pull request as draft April 4, 2023 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index virtual object memberships
2 participants