Processing data with JQ

Introduction

JQ is a tool to process JSON data. It’s great for inspecting, filtering, parsing and reshaping JSON data.

We will be looking at:

  • The commands JQ offers us
  • How to inspect some JSON data
  • How to filter
  • How to map our JSON to a new JSON data structure
  • How to convert our JSON into another data structure

Installation

There are full installation instructions for most platforms in the official documentation.

Let’s get some data

The data we will be using in our examples is about cars. The dataset can be found here if you want to follow along. I have downloaded the data and saved it locally in a file called cars.json to be used in the examples later on.

curl https://raw.githubusercontent.com/vega/vega/master/docs/data/cars.json >> cars.json


The data consists of a list of objects with the following properties:

{
    "Name": "chevy s-10",
    "Miles_per_Gallon": 31,
    "Cylinders": 4,
    "Displacement": 119,
    "Horsepower": 82,
    "Weight_in_lbs": 2720,
    "Acceleration": 19.4,
    "Year": "1982-01-01",
    "Origin": "USA"
  }


There are some operators we should familiarise ourselves with before we go ahead and try to do anything with this data.

Identity

The identity filter returns the input to JQ as the output. The syntax is a full stop (.). Here is a simple example.

echo '{"foo": "bar"}' | jq .


Will output.

{
  "foo": "bar"
}

Filtering

This is how we can get information out of our JSON objects. The syntax is .foo where foo is the key of the value we want to return.

echo '{"foo": "bar"}' | jq .foo
"bar"


What if we have a JSON object that is more complex and we want to get a value where the key is not at the top level? We can string filters together to go deeper into the JSON structure.

echo '{"foo": {"hello": "world"}}' | jq .foo.hello
"world"


If the filter you are applying does not find any value then null will be returned.

echo '{"foo": {"hello": "world"}}' | jq .foo.hell
null


The above command will not work if the key starts with a number, a special characters or contains a full stop so we need to wrap that portion of our filter in quotes.

echo '{"foo": {"1hello": "world"}}' | jq '.foo."1hello"'
"world"

echo '{"foo": {"*hello": "world"}}' | jq '.foo."*hello"'
"world"

echo '{"foo": {"hello.test": "world"}}' | jq '.foo."hello.test"'
"world"


You can have multiple filters by using a comma ,. This will pass the input into each filter and output the results of each filter.

echo '{"foo": "bar", "hello": "world"}' | jq .foo,.hello
"bar"
"world"


If you have multiple filters but you want the output of the first filter to be the input of the second filter you can use a pipe | to achieve this.

echo '[{"foo": {"x": 4}}, {"hello": {"world": 6}}]' | jq '.[] | .foo | .x'
4
null


The select function is used to filter the input based on a boolean expression. The output is the unchanged input only if the boolean expression is true.

echo '[{"foo": "bar"}, {"hello": "world"}]' | jq '.[] | select(.foo == "bar")'
{
  "foo": "bar"
}

Arrays

We can get items from an array by using the index of the array to get the items we want.

echo '["zero", "one", "two"]' | jq '.[1]'
"one"


If the index is out of bounds the output will be null.

echo '["zero", "one", "two"]' | jq '.[7]'
null


You can also do a reverse look up by using a negative index. If you wanted to get the last item of the array you can use -1.

echo '["zero", "one", "two"]' | jq '.[-1]'
"two"


If you omit the index then all the results will be output each on a new line in ascending order.

echo '["zero", "one", "two"]' | jq '.[]'
"zero"
"one"
"two"

Functions

Math

You can add values together using the + operator. There will be a different effect depending on the data type of the value.

Numbers are added normally.

echo '{"x": 5, "y": 10}' | jq '.x + .y'
15


Array are concatenated.

echo '{"x": [1, 2], "y": [3, 4]}' | jq '.x + .y'
[1,2,3,4]


Strings are joined together.

echo '{"x": "foo", "y": "bar"}' | jq '.x + .y'
"foobar"


Objects are merged together. If both objects contain the same key the value from the right object wins.

echo '{"x": {"foo": "bar"}, "y": {"hello": "world"}}' | jq '.x + .y'
{
  "foo": "bar",
  "hello": "world"
}


You can subtract values from one another by using the - operator. This operator works on number like you would expect but can also remove all values in the first array that are present in the second array.

Subtracting numbers.

echo '{"x": 5, "y": 10}' | jq '.x - .y'
-5


Subtracting arrays.

echo '{"x": [1, 2], "y": [1, 4]}' | jq '.x - .y'
[2]


You can multiply values together using the * operator. There will be a different effect depending on the data type of the value.

Numbers are multiplied normally.

echo '{"x": 5, "y": 10}' | jq '.x * .y'
50


String are duplicated and concatenated if multiplied by a number.

echo '{"x": "foo", "y": 3}' | jq '.x * .y'
"foofoofoo"


Objects are merged together recursively.

echo '{"x": {"foo": {"z": 3}}, "y": {"foo": {"hello": "world"}}}' | jq '.x * .y'
{
  "foo": {
    "z": 3,
    "hello": "world"
  }
}


How we have familiarised ourselves with some of the most useful commands JQ offers, lets now take a look at how we can apply these to a JSON dataset.

Inspecting the data

First thing we will want to know is what exactly is included in the data we have and how is the JSON structured. We can do that by using JQ to pretty print the file.

cat cars.json | jq
[
  {
    "Name": "chevrolet chevelle malibu",
    "Miles_per_Gallon": 18,
    "Cylinders": 8,
    "Displacement": 307,
    "Horsepower": 130,
    "Weight_in_lbs": 3504,
    "Acceleration": 12,
    "Year": "1970-01-01",
    "Origin": "USA"
  },
  ...

Filtering

Lets look at some examples of how we may want to filter this data.

Filter for all cars built in the year 1972.

cat cars.json | jq '.[] | select(.Year == "1972-01-01")'


Step by step the following is happening:

  1. We are reading our JSON file using cat
  2. We are piping it into JQ
  3. We are getting each element of the array
  4. For each element of the array we are selecting only those where the Year property is equal to 1972-01-01

This will filter all the objects where the Year is equal to 1972-01-01 and return reach object on a new line. If we wanted to return an array object we would need to wrap the entire expression in an array.

cat cars.json | jq '[ .[] | select(.Year == "1972-01-01") ]'

Mapping the data into another JSON format

We have filtered some objects we want but the JSON structure is not what we want so we need to map the current structure to a structure we want.

Lets say we want this structure.

 {
    "name": "dodge colt (sw)",
    "performance": {
        "horsepower": 80,
        "acceleration": 15,
        "cylinders": 4,
        "displacement": 98
    },
    "efficiency": {
        "milesPerGallon": 28,
        "weightInLbs": 2164
    },
    "build": {
        "year": "1972-01-01",
        "origin": "USA"
    }
}


We can do this by piping the filtered results and building up the JSON object we want.

cat cars.json | jq '[ .[] | select(.Year == "1972-01-01") | {name: .Name, performance: {horsepower: .Horsepower, acceleration: .Acceleration, cylinders: .Cylinders, displacement: .Displacement}, efficiency: {milesPerGallon: .Miles_per_Gallon, weightInLbs: .Weight_in_lbs}, build: {year: .Year, origin: .Origin}} ]'

Converting to CSV

If the data is being used by a human being and not by a machine, you may need to change the format into something else. A CSV file is one option so lets go ahead and convert the filtered data into a CSV format.

cat cars.json | jq -r '.[] | select(.Year == "1972-01-01") | [.Name, .Horsepower, .Acceleration, .Cylinders, .Displacement, .Miles_per_Gallon, .Weight_in_lbs, .Year, .Origin] | @csv'


Theres a few things in here worth explaining.

  1. We add the -r flag to the JQ command to return the raw output. This will prevent converting the output into a string that would cause double quoting, allowing us to import the results into a program that can read CSV data
  2. After filtering we format each line of the results into an array
  3. We pass the array into the @csv operator which will format the array into a line of CSV

I hope you enjoyed this introduction to JQ and learned something new. JQ is very powerful and very useful if you use JSON data on a regular basis.