Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make URL encoding fully WHATWG-compliant #1454

Merged
merged 3 commits into from
Sep 21, 2024

Conversation

cyqsimon
Copy link
Contributor

While debugging the failing tests in #1093, I ran into some trouble with improperly-encoded URLs at the front end. It turns out percent-encode sets defined by miniserve aren't fully compliant with the spec. This PR fixes that.

@svenstaro
Copy link
Owner

Hey, much appreciated! Is there any chance you could add a test that would actually ensure compliance?

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Sep 21, 2024

Ugh, I'm confused. How do you write a test for something like this? This is quite literally the verbatim implementation of the spec:

The query percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).
The special-query percent-encode set is the query percent-encode set and U+0027 (').
The path percent-encode set is the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).
The userinfo percent-encode set is the path percent-encode set and U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005E (^), inclusive, and U+007C (|).
The component percent-encode set is the userinfo percent-encode set and U+0024 ($) to U+0026 (&), inclusive, U+002B (+), and U+002C (,).

What I mean to say is, isn't this something we just double, triple check manually here once and be done?

P.s. I did notice the erroneous BASE set that was previously left in; removed in latest commit.

@cyqsimon
Copy link
Contributor Author

Woah I must have been drunk or something to make such basic errors. Almost looks like I couldn't read.

@svenstaro
Copy link
Owner

Fair enough.

By the way, do you think it might be nice to try to upstream this into percent_encoding? They already have some constants but this looks like a logical thing to upstream.

@svenstaro svenstaro merged commit be39fa4 into svenstaro:master Sep 21, 2024
17 checks passed
@svenstaro
Copy link
Owner

Thanks for this. I cross-checked the spec and this seems good!

svenstaro added a commit that referenced this pull request Sep 21, 2024
@cyqsimon
Copy link
Contributor Author

By the way, do you think it might be nice to try to upstream this into percent_encoding? They already have some constants but this looks like a logical thing to upstream.

Yeah I questioned why this wasn't already a part of percent_encoding as well. But it turns out they used to have these things and decided to remove them. The reason I'm not sure, but regardless I don't think they would be interested in having them re-added.

@cyqsimon cyqsimon deleted the whatwg-spec branch September 22, 2024 13:26
@svenstaro
Copy link
Owner

svenstaro commented Sep 23, 2024

That's really odd. Thanks for looking into it. Perhaps open an issue to discuss upstream? Seems like it was removed a few years ago and the mindset might've changed.

@cyqsimon
Copy link
Contributor Author

cyqsimon commented Sep 23, 2024

That's really odd. Thanks for looking into it. Perhaps open an issue to discuss upstream? Seems like it was removed a few years ago and the mindset might've changed.

Found this: servo/rust-url#837. Seems like someone has the same idea but the PR looks to be stalled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants