Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string parsing (builtin, test and doc). #373

Closed
wants to merge 1 commit into from

Conversation

fblondiau
Copy link

Well, I had to process such a log file... where a value from a JSON object was a query to our server.

...
{"date":"2014/03/05 15:25:42","ip":"81.83.144.50","query":"version=2014_02_20&language=nl&UUID=BE7FE9D9-83E1-4117-9ECF-69AEA4FF4124&dev=iPad3%2C3&os=7.0.6"}
{"date":"2014/03/05 15:27:43","ip":"91.183.33.9","query":"version=2014_02_20&language=fr&UUID=494018D1-6CAF-4D99-906B-088A9F9DCD02&dev=iPhone3%2C1&OS=7.0.6"}
...

Adding "parse" (somehow related to "split") made my processing more "natural"... I hope it helps. Thanks for creating jq.

@nicowilliams
Copy link
Contributor

This is really cool!

Since it's so specific to parsing URI query parameters, maybe the function should be named something more indicative of that?

@ghost
Copy link

ghost commented Jun 2, 2014

On the one hand, I'm going to use this a lot. On the other hand, I'm not sure this belongs on jq's standard library, although I certainly see the value of a "batteries included" approach. I agree with @nicowilliams in that changing the name would be nice.

A few thoughts on this:

  • It would come in handy if it parsed this[that]=whatever into {"this": {"that": "whatever"}}. Similarly, it could parse foo[] occurrences into arrays. I have no idea how should it react if both cases combined into one URI string, though.
  • How does it handle several query arguments with the same name? Does it create an array of values, or does it override the first one with the second? Modern HTTP servers do the former, PHP does the latter.
  • What's the situation with this and @uri? Perhaps a function that decoded URI encoding could be extracted from here.

@fblondiau
Copy link
Author

Thanks for these feedback and suggestions. I like the idea of this[that]=whatever into {"this": {"that": "whatever"}}...
Several query arguments with the same name are handled the same way the additions of objects operates in jq : If the query contains values for many same keys, the value on the right wins (like in php, indeed).

@pkoppstein
Copy link
Contributor

Please do NOT include this URL-parsing function as currently named and implemented. The name "parse" should be reserved for something as generic as parsing. Here is a list of some alternatives to consider for the "parse URL" function:

  1. Name it parse_url (like PHP)
  2. Name it parseUrl (like http://api.jquerymobile.com/jQuery.mobile.path.parseUrl)
  3. Name it ParseURL (or something beginning with a capital letter)
  4. Change the syntax to parse(<string>; <rule>) so that for URL-parsing, you would specify <rule> as "URL".
  5. Wait until details about support for generic parsing (via regex and/or PEG) become available.

Each of these options naturally has its pros and cons. It may also be worthwhile considering more complete parsing of URLs, e.g. along the lines of http://api.jquerymobile.com/jQuery.mobile.path.parseUrl

@nicowilliams
Copy link
Contributor

@pkoppstein Yes, I like a parse(what) idea, but since we'd then need a mechanism by which to extend it, I prefer parse_what. Since this parser is just for URI query parameters, I think the right name for it is parse_uri_query_params or something of the sort. I wouldn't ask the submitter to produce a complete URI parser, as that's a fairly ambitious project by comparison.

(On a related note, I'd like to have something like a URI template (RFC6570) facility in jq! Level 4 template support would be particularly awesome.)

@nicowilliams
Copy link
Contributor

I'll review, and I may merge a version with your function renamed as discussed above.

@pkoppstein
Copy link
Contributor

As noted in #439, the support jq now offers for regular expressions means we can for example parse URL-style "queries" very simply. Using capture/2 as defined in #439, the task becomes trivial:

def parseQuery:
  reduce capture( "&?(?<tag>[^=]*)=(?<value>[^&]*)"; "g" ) as $v
  ({}; . + { ($v.tag) : $v.value });

For example:

Input: "a=b&c=d&e=f"
Output:
{
  "a": "b",
  "c": "d",
  "e": "f"
}

Using jq's support for string interpolation, the above parseQuery can be implemented using the following generalization:

# sep and eq should be non-empty strings corresponding
#  to the separator and equals sign used in the form: &tag=value
def parseQuery(sep; eq):
      reduce capture( "\(sep)?(?<tag>[^\(eq)]*)\(eq)(?<value>[^\(sep)]*)"; "g" ) as $v
      ({}; . + { ($v.tag) : $v.value });

@dtolnay
Copy link
Member

dtolnay commented Jul 25, 2015

I agree with @pkoppstein that parse_url is much less valuable now that capture/2 makes it easy to do this in a general way.

The un-percent-encoding functionality in this PR may be useful in an external library if it can be translated to jq code.

@dtolnay dtolnay closed this Jul 25, 2015
@dtolnay dtolnay mentioned this pull request Jul 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants