Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec and WPT inconsistencies #239

Open
anonrig opened this issue Dec 18, 2024 · 9 comments · May be fixed by web-platform-tests/wpt#49782
Open

Spec and WPT inconsistencies #239

anonrig opened this issue Dec 18, 2024 · 9 comments · May be fixed by web-platform-tests/wpt#49782

Comments

@anonrig
Copy link

anonrig commented Dec 18, 2024

What is the issue with the URL Pattern Standard?

There is a web-platform test that is even implemented by Chrome that is not covered with the URLPattern spec.

new URLPattern({ "protocol": "http", "port": "80 " })

This fails on Chromium due to this function validation: https://chromium.googlesource.com/chromium/src/+/main/extensions/common/url_pattern.cc#101

But other than canonicalizePort there is no place that actually validates the validity of the port, and canonicalizePort calls url parser setter which removes leading and trailing spaces which makes "80 " input valid.

Relevant WPT: https://github.com/web-platform-tests/wpt/blob/0c1d19546fd4873bb9f4147f0bbf868e7b4f91b7/urlpattern/resources/urlpatterntestdata.json#L1146

@anonrig
Copy link
Author

anonrig commented Dec 19, 2024

Another issue @annevk : new URLPattern({ hostname: 'bad#hostname' }); should not throw but there is a WPT that validates that it throws.

{
    "pattern": [{ "hostname": "bad#hostname" }],
    "expected_obj": "error"
  },

This is wrongly implemented on Deno's URLPattern, Cloudflare's workerd and Chromium.

@annevk
Copy link
Member

annevk commented Dec 19, 2024

Hmm, but # is a https://url.spec.whatwg.org/#forbidden-host-code-point so that does seem weird?

@anonrig
Copy link
Author

anonrig commented Dec 19, 2024

If I understand it correctly: This doesn't fail, hence it shouldn't fail on URLPattern as well:

const url = new URL('fake://dummy.test')
u.hostname = 'bad#hostname';

@anonrig
Copy link
Author

anonrig commented Dec 19, 2024

Host parser (specifically domain to ASCII with domain and false) strip all trailing values whenever it sees # https://url.spec.whatwg.org/#concept-host-parser

@anonrig
Copy link
Author

anonrig commented Dec 24, 2024

Another test case is invalid: ada-url/ada@d17f000

If you run the following on Google Chrome, you'll get the following error:

> new URLPattern("https://{sub.}?example{.com/}foo")

VM970:1 Uncaught TypeError: Failed to construct 'URLPattern': Invalid hostname pattern '{sub.}?example{.com/}foo'. Invalid hostname pattern 'example.com/foo'.
    at <anonymous>:1:1

But, canonicalize_hostname() method spec doesn't mention anything about certain edge cases: https://urlpattern.spec.whatwg.org/#canonicalize-a-hostname

Particularly, the following should work and works according to URL spec:

Welcome to Node.js v23.4.0.
Type ".help" for more information.
> let u = new URL('fake://dummy.test')
undefined
> u
URL {
  href: 'fake://dummy.test',
  origin: 'null',
  protocol: 'fake:',
  username: '',
  password: '',
  host: 'dummy.test',
  hostname: 'dummy.test',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
> u.hostname = 'example.com/foo'
'example.com/foo'
> u
URL {
  href: 'fake://example.com',
  origin: 'null',
  protocol: 'fake:',
  username: '',
  password: '',
  host: 'example.com',
  hostname: 'example.com',
  port: '',
  pathname: '',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

Therefore, this test case shouldn't fail.

@jeremyroman
Copy link
Collaborator

Going to see how many of these I can get through today.

For this example:

new URLPattern({ "protocol": "http", "port": "80 " })

that's not the relevant Chromium code, but instead that we use url::CanonicalizePort here and here, but that function does not strip whitespace, which the URL port setter does as part of the basic URL parser, when it does this:

Remove all ASCII tab or newline from input.

I think this change is probably minor enough (especially since it only makes previously invalid patterns valid) that changing the implementation(s) to match the spec is okay.

@anonrig
Copy link
Author

anonrig commented Dec 27, 2024

@jeremyroman I think there are more invalid cases like this. I've removed and updated the following test cases on urlpatterntestdata.json. Ref: https://github.com/ada-url/ada/pull/785/files#diff-b8289182a638cd2b53ad33fee3a3aeaeb6fcd4d73e2cd9ebb0974a0fc8c2236f

Appreciate if you could take a look. They are mostly around hostname canonicalize method due to Chromium not using the URL parser.

@jeremyroman
Copy link
Collaborator

Yeah, I'm in the process of looking into what you've said.

For the port example, on further inspection the change I mention does address some whitespace (newlines and tabs) but not spaces. In the Chromium implementation, it also winds through ParsePortFromStringPosition which simply ignores any leading zeroes and any junk after the ASCII digit sequence, whose spec counterpart is here.

@jeremyroman
Copy link
Collaborator

For the hostname ones, the cases of /?# seem quite parallel to the port ones.

The case of \ is quite weird -- during pattern parsing we can't tell for sure if the URL is special so treat it as not, but for interpreting the init dictionary to test/exec etc we really could do so. The spec currently doesn't really try to do anything about that, though. Thoughts?

For the other bad characters, we have comments in Chromium linking to https://issues.chromium.org/u/0/issues/40124263. If it's just a matter of a (long-standing?) Chromium-specific bug I suppose we should probably test the standard behavior, though that kinda suggests it might be tough for us to fix which might motivate the spec changing. Not familiar enough with that bug yet to comment off the top of my head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants