Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty regular expression matches (fix #2565) #2677

Merged
merged 1 commit into from
Jul 9, 2023

Conversation

itchyny
Copy link
Contributor

@itchyny itchyny commented Jul 9, 2023

As reported by #2565, "ab" | match(""; "g") should yield 3 matches at offset 0, 1, and 2. Fixing this behavior also fixes "a" | gsub(""; "a") to emit "aaa" not "aa". Fixes #2565.

@itchyny itchyny added the bug label Jul 9, 2023
@itchyny itchyny added this to the 1.7 release milestone Jul 9, 2023
@pkoppstein
Copy link
Contributor

@itchyny - Congratulations!

You could remove the disclaimer at onig.test:91
("The following is a regression test ...")

@nicowilliams nicowilliams merged commit 600e602 into jqlang:master Jul 9, 2023
@nicowilliams
Copy link
Contributor

Thanks!

@emanuele6
Copy link
Member

Fixing this behavior also fixes "a" | gsub(""; "a") to emit "aaa" not "aa".

I think that is a bug, it should emit "aa", no?

@itchyny
Copy link
Contributor Author

itchyny commented Jul 18, 2023

Why? Empty regex matches every character boundaries. Easier example is "xyz" | gsub(""; "a") should be axayaza

@emanuele6
Copy link
Member

@itchyny
Oh, you are right, I got confused.

@pkoppstein
Copy link
Contributor

I think that is a bug, it should emit "aa", no?

No. @itchyny and (not surprisingly :-) gojq are correct.

In any case, gsub relies (and ought to rely) on match(_;"g"). In this case, the call to match produces two results, ergo two insertions.

@emanuele6
Copy link
Member

I was used to the sed behaviour where sed 's/1*/2/g' <<< 11' outputs 2, but it looks like in javascript and other languages, if the pattern is able to match the empty string, it matches the empty string at the end of the input even if the previous match goes up to the last character of the input: node -p '"1111".replaceAll(/1*/g, "2")' outputs 22

@emanuele6
Copy link
Member

jq's gsub used to behave like sed's s///g, but now it behaves more like javascript's replaceAll after this patch:

$ jq -n '"1111" | gsub("1*";"2")'
2
$ ./jq -n '"1111" | gsub("1*";"2")'
22

Fine, I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero length regular expression match misbehaviour
4 participants