Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make safe_url_string safer #201

Closed
wants to merge 27 commits into from
Closed

Conversation

Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Nov 8, 2022

Tasks:

  • Provide an alternative implementation for safe_url_string based on the URL living standard
  • Add missing typing information
  • Provide comprehensive test coverage for both implementations, highlighting the differences, and in the process make the new implementation support also RFC 2396 (used by java.net.URI) and RFC 3986 (the previous standard, still popular).
  • Ensure the following issues are addressed by the new implementation:
  • Have the URL living standard parse implementation mostly pass the upstream tests.
  • Improve performance
    The new implementation is currently up to 40 times slower than the previous implementation for test cases where both have the same, non-error outcome.
  • Clean up the implementation: cleaner code, docstrings…
  • Discuss what to do API-wise (deprecate safe_url_string in favor of safe_url? Remove safe_url and make safe_url_string use the new implementation?

@codecov
Copy link

codecov bot commented Nov 8, 2022

Codecov Report

Merging #201 (88f32ae) into master (fb70566) will decrease coverage by 0.99%.
The diff coverage is 94.48%.

@@            Coverage Diff             @@
##           master     #201      +/-   ##
==========================================
- Coverage   95.98%   94.98%   -1.00%     
==========================================
  Files           6       14       +8     
  Lines         473     1377     +904     
  Branches       90      333     +243     
==========================================
+ Hits          454     1308     +854     
- Misses          9       40      +31     
- Partials       10       29      +19     
Impacted Files Coverage Δ
w3lib/_utr46.py 75.73% <75.73%> (ø)
w3lib/_encoding.py 88.23% <88.23%> (ø)
w3lib/_url.py 98.00% <98.00%> (ø)
w3lib/_infra.py 100.00% <100.00%> (ø)
w3lib/_rfc2396.py 100.00% <100.00%> (ø)
w3lib/_rfc3986.py 100.00% <100.00%> (ø)
w3lib/_rfc5892.py 100.00% <100.00%> (ø)
w3lib/_util.py 100.00% <100.00%> (ø)
w3lib/url.py 98.69% <100.00%> (+0.06%) ⬆️

@Gallaecio
Copy link
Member Author

The new implementation is currently up to 40 times slower than the previous implementation for test cases where both have the same, non-error outcome.

@kmike Shall I look into using Cython here?

@Gallaecio
Copy link
Member Author

I have opened #203 as a simpler approach that is good enough to address #80 and similar issues without any major rework of safe_url_string.

I am closing this pull request, as I am stopping work on it. But I have created #204 so that someone, not necessarily me, can take over and finish the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant