feat: url_decode function #2957

rodcoffani · 2023-11-30T10:42:40Z

url_decode function

Creates:

url_decode
url_encode (just to keep the names similar)
tests for both of them.

The code for this functions were based of Rosetta Code, like suggested by others maintainers.

Motivation

There is some issues related and asking for this implementation, for example: #2261 and even older ones like #798 , closing this issues.

emanuele6 · 2023-11-30T11:01:49Z

This implementation is just not correct.
You are not converting bytes back to codepoints; you are just using bytes as if they were codepoints:

$ ./jq -na '"è" | @uri | ., url_decode'
"%C3%A8"
"\u00c3\u00a8"

$ ./jq -na '"è" | ., (@uri | ., (url_decode | ., (@uri | ., url_decode)))'
"\u00e8"
"%C3%A8"
"\u00c3\u00a8"
"%C3%83%C2%A8"
"\u00c3\u0083\u00c2\u00a8"

The correct result should have been an è/\u00e8

$ jq -na '"è"'
"\u00e8"

"\u00e8"
"%C3%A8"
"\u00e8"
"%C3%A8"
"\u00e8"

rodcoffani · 2023-11-30T11:51:11Z

This implementation is just not correct. You are not converting bytes back to codepoints; you are just using bytes as if they were codepoints:
$ ./jq -na '"è" | @uri | ., url_decode'
"%C3%A8"
"\u00c3\u00a8"
$ ./jq -na '"è" | ., (@uri | ., (url_decode | ., (@uri | ., url_decode)))'
"\u00e8"
"%C3%A8"
"\u00c3\u00a8"
"%C3%83%C2%A8"
"\u00c3\u0083\u00c2\u00a8"
The correct result should have been an è/\u00e8
$ jq -na '"è"'
"\u00e8"
"\u00e8"
"%C3%A8"
"\u00e8"
"%C3%A8"
"\u00e8"

You were right! Sorry, we didn't catch those cases in the first implementation. Analyzing the issues we commented (especially #798 ), there were a function that matches exactly with your test cases and even with emotes. (This one

Example:

rodcoffani · 2023-11-30T12:06:42Z

Showed up an failed test at the checks, but as it is only in one check and it is labeled as "disabled", I don't believe it has to do with this new feature.

emanuele6 · 2023-11-30T12:14:37Z

You have added those tests to tests/jq.test instead of tests/onig.test, but your implementations requires regular expression support (provided by liboniguruma) for gsub/2, so they fail in the build without liboniguruma.

wader · 2023-11-30T12:17:27Z

#2261 and #798 is about doing this is a format and also gojq implemented this as @urid, think that would be better. Also would be great to be consistent with uri vs url

itchyny · 2023-11-30T12:18:36Z

Why not just adding @urid?

emanuele6 · 2023-11-30T12:24:08Z

URL decoding a byte sequence that only contains non-utf-8 bytes triggers an error:

$ ./jq -na '"%ff%fa" | url_decode'
jq: error (at <unknown>): null (null) and number (128) cannot be subtracted

URL decoding a byte sequence that contains non-utf-8 bytes, but contains a valid utf-8 subsequences, results in a single \ufffd.

$ ./jq -na '"%ff%c3%a7%fa" | url_decode'
"\ufffd"

The correct behaviour would be that every non-utf-8 byte becomes \ufffd:

$ jq -Ra . <<< $'\xff\xfa'
"\ufffd\ufffd"
$ jq -Ra . <<< $'\xff\xc3\xa7\xfa'
"\ufffd\u00e7\ufffd"

emanuele6 · 2023-11-30T22:54:20Z

I've just noticed that the implementation is literally just entirely copied from this gist https://gist.github.com/jcracknell/52cf4e0f8c4518a853784638db8a258d

Come on now. :/

rodcoffani · 2023-12-01T11:28:54Z

As the code was never merged to the main repo I thought would be a good addition to it, that's why I referenced the gist in my comment, sorry for not being more clear.

But thanks for the feedback! I read the comments and I would like to know what is the best approach here:

would be better to add only as @urid? or with the functions url_decode/url_encode AND with @urid?
fixing the code and implementing more accurate tests, it still would be a nice thing to add?

rodcoffani added 2 commits November 30, 2023 07:21

feat: url_decode and url_encode

df67b6b

feat: tests for url_encode/url_decode

26d11ae

feat: new function and tests

c5f8a44

emanuele6 closed this Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: url_decode function #2957

feat: url_decode function #2957

rodcoffani commented Nov 30, 2023 •

edited

Loading

emanuele6 commented Nov 30, 2023 •

edited

Loading

rodcoffani commented Nov 30, 2023

rodcoffani commented Nov 30, 2023

emanuele6 commented Nov 30, 2023 •

edited

Loading

wader commented Nov 30, 2023

itchyny commented Nov 30, 2023

emanuele6 commented Nov 30, 2023 •

edited

Loading

emanuele6 commented Nov 30, 2023

rodcoffani commented Dec 1, 2023 •

edited

Loading

feat: url_decode function #2957

feat: url_decode function #2957

Conversation

rodcoffani commented Nov 30, 2023 • edited Loading

url_decode function

Motivation

emanuele6 commented Nov 30, 2023 • edited Loading

rodcoffani commented Nov 30, 2023

rodcoffani commented Nov 30, 2023

emanuele6 commented Nov 30, 2023 • edited Loading

wader commented Nov 30, 2023

itchyny commented Nov 30, 2023

emanuele6 commented Nov 30, 2023 • edited Loading

emanuele6 commented Nov 30, 2023

rodcoffani commented Dec 1, 2023 • edited Loading

rodcoffani commented Nov 30, 2023 •

edited

Loading

emanuele6 commented Nov 30, 2023 •

edited

Loading

emanuele6 commented Nov 30, 2023 •

edited

Loading

emanuele6 commented Nov 30, 2023 •

edited

Loading

rodcoffani commented Dec 1, 2023 •

edited

Loading