jq, xq and yq - Handy tools for the command line

2020-04-27 20:51 (updated 2021-09-19 22:41)

Detlev Zundel

Tags:

The classic Unix command line tools like grep, sed and awk do not cope well with structured data formats like JSON, YAML or even XML. Although I did use them with some acrobatics for such formats, a proper solution requires tools that natively understand those formats.

JSON

The jq tool for JSON seems to be already well known and indeed is a very handy tool as many software packages use JSON for serialized data. Here is a short example for parsing metadata of the npm package manager:

dzu@krikkit:~$ jq .name ~/node_modules/bash-language-server/package.json
"bash-language-server"
dzu@krikkit:~$ jq .repository ~/node_modules/bash-language-server/package.json
{
  "type": "git",
  "url": "https://github.com/bash-lsp/bash-language-server"
}
dzu@krikkit:~$ jq .repository.url ~/node_modules/bash-language-server/package.json
"https://github.com/bash-lsp/bash-language-server"
dzu@krikkit:~$ jq -r .repository.url ~/node_modules/bash-language-server/package.json
https://github.com/bash-lsp/bash-language-server
dzu@krikkit:~$

Note the usage of the "-r" option in the last example for "raw output". This allows us to easily use the result in the shell by command substitution:

dzu@krikkit:~$ firefox $(jq -r .repository.url ~/node_modules/bash-language-server/package.json)
dzu@krikkit:~$

The GNU/Linux distributions I use daily, e.g. Debian and Ubuntu, include the package so it is easy to install and keep up to date.

Note that jq will do syntax coloring and properly indent the output whatever you do. So if you are looking at a valid but difficult to read JSON file, the trivial filter . will give you a nice rendering of the data:

dzu@krikkit:~$ cat json-stringify
{"orderID":12345,"shopperName":"John Smith","shopperEmail":"johnsmith@example.com","contents":[{"productID":34,"productName":"SuperWidget","quantity":1},{"productID":56,"productName":"WonderWidget","quantity":3}],"orderCompleted":true}
dzu@krikkit:~$ jq . json-stringify
{
  "orderID": 12345,
  "shopperName": "John Smith",
  "shopperEmail": "johnsmith@example.com",
  "contents": [
    {
      "productID": 34,
      "productName": "SuperWidget",
      "quantity": 1
    },
    {
      "productID": 56,
      "productName": "WonderWidget",
      "quantity": 3
    }
  ],
  "orderCompleted": true
}
dzu@krikkit:~$

JSON in the pipeline

It seems tools learn to output complex data in JSON in which case jq should probably advance to the standard Linux command line toolkit. See my other post for an example of that.

XML

While working on the scripts for my GPX collection I wondered if there is an equivalent for the XML format used here. It would be a significant advantage over the improvised grep and sed commands which I know to be not very robust.

It turns out that there are converters for YAML and XML to JSON and thus can be easily combined into an command-line YAML and XML processor.

Having installed yq with the Python packager pip, we get the yp binary for working with YAML files and the xq binary for XML. Using the jq manual, it is straightforward to apply it to GPX files in my collection. Here are a few examples intended to convey an idea of the possibilities. Extracting the name of the track is easy, also finding the number of track points in it:

dzu@krikkit:/tmp/gpx$ xq .gpx.trk.name 20191231.gpx
"GC - Laushalde"
dzu@krikkit:/tmp/gpx$ xq '.gpx.trk.trkseg.trkpt | length' 20191231.gpx
328
dzu@krikkit:/tmp/gpx$

Array indices make it easy to extract specific waypoints and for example the start and end time of the whole track:

dzu@krikkit:/tmp/gpx$ xq '.gpx.trk.trkseg.trkpt[0]' 20191231.gpx
{
  "@lat": "48.46705179",
  "@lon": "10.03310901",
  "ele": "382",
  "time": "2019-12-31T11:49:09.000Z"
}
dzu@krikkit:/tmp/gpx$ xq '.gpx.trk.trkseg.trkpt[0,-1].time' 20191231.gpx
"2019-12-31T11:49:09.000Z"
"2019-12-31T13:03:05.000Z"
dzu@krikkit:/tmp/gpx$

Converting GPX to CSV

In the comments to this post, Tom Brown contributed a very nice one liner to convert a GPX track to a CSV file. I’ll try to break the script down into smaller chunks and retrace the steps to arrive at the clever result.

The aim of the jq script is to transform the waypoints into single line records starting with the time attribute. Searching the jq manual, we find that the function @csv will render an array to a single CSV line. So one important task is to transform the JSON objects of the GPX file into arrays. To show the process, here is an example of using only the first two track points. The array for @csv is built by the map operator transforming the array of waypoints (i.e. JSON objects) into an array of string arrays with the values of the attributes:

dzu@krikkit:/tmp/gpx$ xq -r '[.gpx.trk.trkseg.trkpt[0,1]] | map([.time, ."@lat", ."@lon", .ele ])' 20191231.gpx
[
  [
    "2019-12-31T11:49:09.000Z",
    "48.46705179",
    "10.03310901",
    "382"
  ],
  [
    "2019-12-31T11:49:10.000Z",
    "48.46705087",
    "10.03310868",
    "382"
  ]
]
dzu@krikkit:/tmp/gpx$

Note that map works on an array, so for this (contrived) example we had to assemble the two waypoints explicitly into an array before passing it to map. The final invocation will not need this as .gpx.trk.trkseg.trkpt without any index is an array.

Now for @csv to work we again have to remove this (top level) extra array:

dzu@krikkit:/tmp/gpx$ xq -r '[.gpx.trk.trkseg.trkpt[0,1]] | map([.time, ."@lat", ."@lon", .ele ]) | .[]' 20191231.gpx
[
  "2019-12-31T11:49:09.000Z",
  "48.46705179",
  "10.03310901",
  "382"
]
[
  "2019-12-31T11:49:10.000Z",
  "48.46705087",
  "10.03310868",
  "382"
]
dzu@krikkit:/tmp/gpx$

And finally we can call @csv on the result:

dzu@krikkit:/tmp/gpx$ xq -r '[.gpx.trk.trkseg.trkpt[0,1]] | map([.time, ."@lat", ."@lon", .ele ]) | .[] | @csv' 20191231.gpx
"2019-12-31T11:49:09.000Z","48.46705179","10.03310901","382"
"2019-12-31T11:49:10.000Z","48.46705087","10.03310868","382"
dzu@krikkit:/tmp/gpx$

According to the GPX 1.1 Schema, the time field in wptType is of type xsd:dateTime with the letter T between the date and the time and an optional offset of the local timezone against UTC at the end. Usually this is just a plain Z specifying an offset of 0 or in other words to denote the time to be in UTC. Tom writes that the letters T and Z are not parsed correctly by Google Sheets so he uses string slicing to cut out only the wanted characters:

dzu@krikkit:/tmp/gpx$ xq -r '.gpx.trk.trkseg.trkpt[0].time' 20191231.gpx
2019-12-31T11:49:09.000Z
dzu@krikkit:/tmp/gpx$ xq -r '.gpx.trk.trkseg.trkpt[0].time | "\(.[0:10]) \(.[11:19])"' 20191231.gpx
2019-12-31 11:49:09
dzu@krikkit:/tmp/gpx$

Putting all this together yields this clever one liner to convert the whole track at once:

dzu@krikkit:/tmp/gpx$ xq -r '.gpx.trk.trkseg.trkpt | map([(.time|"\(.[0:10]) \(.[11:19])"), ."@lat", ."@lon", .ele ]) | .[] | @csv' 20191231.gpx | head -2
"2019-12-31 11:49:09","48.46705179","10.03310901","382"
"2019-12-31 11:49:10","48.46705087","10.03310868","382"
dzu@krikkit:/tmp/gpx$

Summary

So jq and its relatives are definitely a worthwhile addition to every command toolbox, but for my GPX files I would love to see tools capable of doing real processing of the track points. Although I did look for such tools, I was as yet not successful. So if you know about command line tools to calculate the usual statistics like distance, average speed, max speed, etc. I would love to see pointers in the comment section.

Update 2021-04-29

Add example of "-r" as it eases the interoperability with the shell
Add example of the trivial filter '.' as JSON beautifier.
Add link to ip post as an example of a classic command line tool with optional JSON output

Update 2021-09-19

Add example on how to convert GPX to CSV data and explain it in more depth. This was contributed by Tom Brown through a comment.