-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the origin
of a packaged publication?
#45
Comments
This issue is the spin off of the telco discussion on 2014-04-29, see Meeting minutes. See also #37 |
The Readium document https://github.com/readium/architecture/blob/master/server/origin.md is also related to this issue, focusing on the problems Reading Systems are facing when setting the origin of content. |
I had a discussion with @danielweck on this subject. Here is a summary, and I hope it will help some of us participating to this discussion. Let's consider a Package; let's imagine that once exposed on the web (either statically after unpackaging or dynamically via a "publication server"), its manifest is served from Optionnaly, in json-ld/json-ld.org#604, it seems that the JSON-LD WG has agreed that a @base property can override the default base URL inside the json structure, which mimics what exists with the element in HTML documents. But let's keep that on the side for now. For sure, defining a base URL is not always simple: if the manifest is served from But in practice, what affects the processing of relative URIs in the manifest is the base URL associated with the manifest; and this base URL, for any web resource, incl. json-ld, is defined by standard web practice -> Document base URL The case of a manifest embedded in the PEP was discussed in w3c/json-ld-syntax#23. Maybe @iherman or @BigBlueHat can summarize the conclusion of this thread? |
What if I, at publisher.org, created the package, and then sent it to you at retailer.com? The manifest would be served from retailer.com. If you consider retailer.com the origin of the publication, what's to stop the publisher from including malicious scripts that, for example, rewrite the DOM at retailer.com? |
I think the conclusion is what is in the current JSON-LD 1.1 draft:
The critical piece is the reference to the HTML spec which establishes the base URL for an HTML document. The question is whether what @llemeurfr and @danielweck describe above stands or not for the |
@dauwhe, I wonder what a malicious publisher can do to hack the distributor's platform; could you detail what "rewrite the DOM" can be like and what can happen to the distributing platform? @iherman IMO, the PEP index.html being an html resource, the way relative URLs are processed by web user agents is even clearer than json-ld processing: Document Base URL drives it. |
@llemeurfr what I was worried about is to use an HTML parser by telling it, in some way or other, to use a specific and external base URL for which there is no standard. But, re-reading your comment, I realized that I did not understand what you meant by 'publication server'. Do you mean Of course, for those cases, we do have the types of problems described in the readium note. But, I wonder whether this should not be the point where we simply acknowledge that we do not define a perfect packaging format but a lightweight which does have its limitations (described in the note) and that the 'real' solution would be a future Web Packaging format that, somehow, would have take care of maintaining the origin of the content. |
@iherman by "publication server" I mean any piece of software capable of exposing dynamically a packaged publication (LPF or EPUB format) as a Web Publication. In Readium speak we call it a "streamer".
Yes, such a middleware can expose the Web Publication with a localhost origin or a "web" origin (domain name, ip address), depending its usage (as part of a reading app or "on the web"). The problems exposed in the readium note have to do with the 'origin' of the Web Publication, not really its 'base URL' (and not the origin of the Packaged publication, as there is none); I was fooled by the title of this issue. |
That is correct, although I am not sure we should rely on a strongly 1.1 feature; at the moment, all our manifest are JSON-LD 1.0 compatible it would be fairly difficult to explain the lambda users of our authored manifest what this would mean... But already in JSON-LD 1.0 it was possible to use "@context" : [
"https://schema.org",
"https://www.w3.org/ns/wp-context",
{ "@base": "https://example.org"}
]
... (I have just checked and the structured data testing tool indicates that this is accepted and properly handled by at least that schema.org processor.) I am not sure how that would solve the problem at hand, however, because the big issue with the origin is to ensure that various javascripts have the right origin URL when they do, e.g., fetch to external resources... That being said, canonicalization should be able to handle |
Aren't we talking about the same set of problems? |
For EPUB and LPF, there is truly no origin for these resources. To serve them, we have to adopt various strategies as described in the Readium document, but IMO these are technical implementation details rather than a true origin. |
A precise answer to this question should (probably) be included in the document. This origin affects the way relative URI-s in the manifest are turned into absolute ones, it affects behaviors of scripts, etc.
The text was updated successfully, but these errors were encountered: