Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null exception on checking data URL - EpubCheck 5.0.1 and 5.1.0 #1536

Open
rachanak-dk opened this issue Aug 2, 2023 · 12 comments
Open

Null exception on checking data URL - EpubCheck 5.0.1 and 5.1.0 #1536

rachanak-dk opened this issue Aug 2, 2023 · 12 comments
Assignees
Labels
status: has PR The issue is being processed in a pull request type: bug The issue describes a bug

Comments

@rachanak-dk
Copy link

Hi,

When I check a fixed layout epub using the EpubCheck 5.0.1 or EpubCheck 5.1.0, I get the following. No errors on EpubCheck 5.0.0 and earlier versions. OPF file is also validated and has no resource path issues:

java.lang.NullPointerException: null input
at io.mola.galimatias.URLParser.parse(URLParser.java:215)
at io.mola.galimatias.URL.withPath(URL.java:397)
at io.mola.galimatias.canonicalize.DecodeUnreservedCanonicalizer.canonicalize(DecodeUnreservedCanonicalizer.java:41)
at org.w3c.epubcheck.util.url.URLUtils.normalize(URLUtils.java:188)
at com.adobe.epubcheck.ocf.OCFContainer.contains(OCFContainer.java:88)
at com.adobe.epubcheck.ocf.OCFContainer.isRemote(OCFContainer.java:134)
at org.w3c.epubcheck.core.references.ResourceReferencesChecker.checkReference(ResourceReferencesChecker.java:120)
at org.w3c.epubcheck.core.references.ResourceReferencesChecker.check(ResourceReferencesChecker.java:102)
at com.adobe.epubcheck.opf.OPFChecker.checkPackage(OPFChecker.java:149)
at com.adobe.epubcheck.opf.OPFChecker30.checkPackage(OPFChecker30.java:67)
at com.adobe.epubcheck.opf.OPFChecker.check(OPFChecker.java:94)
at com.adobe.epubcheck.ocf.OCFChecker.check(OCFChecker.java:173)
at com.adobe.epubcheck.api.EpubCheck.doValidate(EpubCheck.java:218)
at com.adobe.epubcheck.tool.EpubChecker.validateFile(EpubChecker.java:250)
at com.adobe.epubcheck.tool.EpubChecker.processFile(EpubChecker.java:325)
at com.adobe.epubcheck.tool.EpubChecker.run(EpubChecker.java:150)
at com.adobe.epubcheck.tool.Checker.main(Checker.java:31)

@prithiviclteam
Copy link

prithiviclteam commented Aug 2, 2023 via email

@titusz
Copy link

titusz commented Aug 4, 2023

I can confirm the issue. I also get:

java.lang.NullPointerException: null input
        at io.mola.galimatias.URLParser.parse(URLParser.java:215)
        at io.mola.galimatias.URL.withPath(URL.java:397)
        at io.mola.galimatias.canonicalize.DecodeUnreservedCanonicalizer.canonicalize(DecodeUnreservedCanonicalizer.java:41)
        at org.w3c.epubcheck.util.url.URLUtils.normalize(URLUtils.java:188)
        at com.adobe.epubcheck.ocf.OCFContainer.contains(OCFContainer.java:88)
        at com.adobe.epubcheck.ocf.OCFContainer.isRemote(OCFContainer.java:134)
        at org.w3c.epubcheck.core.references.ResourceReferencesChecker.checkReference(ResourceReferencesChecker.java:120)
        at org.w3c.epubcheck.core.references.ResourceReferencesChecker.check(ResourceReferencesChecker.java:102)
        at com.adobe.epubcheck.opf.OPFChecker.checkPackage(OPFChecker.java:149)
        at com.adobe.epubcheck.opf.OPFChecker30.checkPackage(OPFChecker30.java:67)
        at com.adobe.epubcheck.opf.OPFChecker.check(OPFChecker.java:94)
        at com.adobe.epubcheck.ocf.OCFChecker.check(OCFChecker.java:173)
        at com.adobe.epubcheck.api.EpubCheck.doValidate(EpubCheck.java:218)
        at com.adobe.epubcheck.tool.EpubChecker.validateFile(EpubChecker.java:250)
        at com.adobe.epubcheck.tool.EpubChecker.processFile(EpubChecker.java:325)
        at com.adobe.epubcheck.tool.EpubChecker.run(EpubChecker.java:150)
        at com.adobe.epubcheck.tool.Checker.main(Checker.java:31)

Repoduce with attached sample EPUB:
nullpointer_epub.zip

@mattgarrish
Copy link
Member

Repoduce with attached sample EPUB:

Looks like the data url in the background-image declaration in the Style.css file is causing it.

@rachanak-dk
Copy link
Author

Repoduce with attached sample EPUB:

Looks like the data url in the background-image declaration in the Style.css file is causing it.

@mattgarrish: Thank you for your response. We are checking the URL pointed out by you which indeed is the one causing the issue.
Thanks

@mattgarrish
Copy link
Member

Reopening this until @rdeltour has a chance to look into it. It seems to be symptomatic of a bigger issue with them, as you shouldn't get a null exception error. Even if I put the data URL into an img tag, or remove the fxl metadata, I get the exception.

Off the top of my head, the only requirement on their use is that they define a core media type, which appears to be the case here.

@mattgarrish mattgarrish reopened this Aug 7, 2023
@mattgarrish mattgarrish changed the title Null exception on checking fixed layout epubs - EpubCheck 5.0.1 and 5.1.0 Null exception on checking data URL - EpubCheck 5.0.1 and 5.1.0 Aug 7, 2023
@b-t-k
Copy link

b-t-k commented Aug 14, 2023

As far as I can tell this affects 100% of our fxl files which now has Overdrive bouncing them. I am trying to figure out why the original producers bothered to add the background as it seems to be 1 pixel dot;

.trn_link {
	background-image:url('data:image/gif;base64,R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICR??AEAOw==');
	position:absolute;
}

@titusz
Copy link

titusz commented Aug 15, 2023

1 pixel images are often used for clickable links that do not wrap any text. It seems that the Data URL is problematic. If you replace it with data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7 it works fine.

@SimonPRH
Copy link

SimonPRH commented Aug 17, 2023

Can confirm all our FXL getting same null pointer exception.

Failure case:
data:image/gif;base64,R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICR??AEAOw==

When replaced with the URL Titusz gave above, no issues.

File is ANCIENT and produced by some old supplier. Offhand I don't actually know what that string after base64 actually means/does!

@mattgarrish
Copy link
Member

Ya, I think I see what's going on. Per the data URL syntax, characters outside the safe range of URL characters have to be escaped. If you percent encode the question marks in the original then you no longer get a null exception from epubcheck.

Of course, epubcheck should report the URL as invalid, not throw a null exception error, so that still needs fixing.

(It also doesn't look like the original data URL is properly base64 encoded, as even after fixing the problem I still get an invalid image.)

@rdeltour
Copy link
Member

+1 on @mattgarrish's analysis: the URL looks non-conforming so should be reported, but EPUBCheck should definitely not throw a NullPointerException in that case. I'll look into it for the next milestone. Thanks all for the report!

@rdeltour rdeltour self-assigned this Aug 31, 2023
@rdeltour rdeltour added type: bug The issue describes a bug status: accepted Ready to be further processed labels Aug 31, 2023
@rdeltour rdeltour added this to the Next maintenance release milestone Aug 31, 2023
@vengattech
Copy link

The latest ePUBChecker 5.1.0 ignores the Thumbs.db error. The availability of Thumbs.db inside the image folder needs to be checked manually before hosting or delivering the ePUB files to clients.

@vengattech
Copy link

The below issues is also in the earlier version 5.0.1.

The latest ePUBChecker 5.1.0 ignores the Thumbs.db error. The availability of Thumbs.db inside the image folder needs to be checked manually before hosting or delivering the ePUB files to clients.

@rdeltour rdeltour added status: in progress The issue is being implemented by the development team and removed status: accepted Ready to be further processed labels Dec 21, 2024
rdeltour added a commit that referenced this issue Dec 23, 2024
A NullPointerException was raised on `data` URLs ending with a query-like
string.
This was caused by a double-bug in Galimatias:
- removing the query-like string (as we do when registering references)
  with `URL#withQuery(null)` resulted in a hierarchical URL despite
  `data` URLs being non-hierarchical.
- such hybrid `data` URLs caused an NPE to be raised when canonicalized

We now add some check to not remove the query component on non-hierarchical
URLs.

Also, `OCFContainer#isRemote(URL)` now returns false without further checks
for `data` URLs.

Fixes #1536
@rdeltour rdeltour added status: has PR The issue is being processed in a pull request and removed status: in progress The issue is being implemented by the development team labels Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: has PR The issue is being processed in a pull request type: bug The issue describes a bug
Projects
None yet
Development

No branches or pull requests

8 participants