jq reduce

2019-01-11 19:38:00 +0000

I found myself having to process a large JSON file using jq --slurp, and it was using up a lot of memory, so I thought it was time to learn about jq’s reduce function.

reduce works like this:

reduce inputs as $line (INIT, REDUCER)

INIT is whatever you want; commonly you’ll use an empty object, {} or an empty array, [].

REDUCER is a jq transform that is given . as the current value of the accumulator, can access $line, and is expected to return the new accumulator value.

A do-nothing reduce would look like this:

reduce inputs as $line
({}; {})

That is: the initial accumulator value is the empty object, and the reducer ignores the input line and returns the empty object.

The following example returns the last value passed to it:

reduce inputs as $line
({}; .)

You can replicate --slurp with the following:

reduce inputs as $line
([]; . + [$line])

A more complicated example

Simplified, the JSON logs that I was processing look like this:

{"timestamp": "2019-01-09T06:52:58.079Z", "user": 1, "data": { ... }}
{"timestamp": "2019-01-09T06:52:58.148Z", "user": 1, "data": { ... }}
{"timestamp": "2019-01-09T06:52:58.171Z", "user": 2, "data": { ... }}
{"timestamp": "2019-01-09T06:52:58.178Z", "user": 1, "data": { ... }}
{"timestamp": "2019-01-09T06:52:58.179Z", "user": 3, "data": { ... }}
{"timestamp": "2019-01-09T06:52:58.231Z", "user": 2, "data": { ... }}

…and I want to count the number of events, per user, per minute.

I came up with the following:

jq 'reduce inputs as $line
    ({};
     ($line.user | tostring) as $user
     | .[$user] as $current
     | $line.timestamp[0:16] as $bucket
     | $current[$bucket] as $count
     | { ($bucket): ($count + 1) } as $this
     | ($current + $this) as $next
     | . + { ($user): ($next) }
    )' \
    raw.json

…which results in the following:

{
  "1": {
    "2019-01-09T06:52": 4,
    "2019-01-09T06:53": 178,
    "2019-01-09T06:54": 202,
  ... etc.

Here’s how it works

For each record ($line) in the input, reduce, starting with an empty object:

jq 'reduce inputs as $line
    ({};

Set $user to the value of the user field (an integer), converted to a string. It needs converting to a string because I’m going to use it as an object key later, and object keys can’t be numbers:

     ($line.user | tostring) as $user

The accumulator (.) is an object. .[$user] gets the current value of the field identified by $user:

     | .[$user] as $current

In our example data, that’s equivalent to, say, acc["1"] or acc["2"]. Assign that to $current.

Then take the timestamp field, trim it to represent just the minute, by using the range expression [0:16], and assign it to $bucket:

     | $line.timestamp[0:16] as $bucket

The current record for the user is an object, keyed by bucket, where the value is the current count:

     | $current[$bucket] as $count

Create a new record with the same key, incrementing the count. The name $this could be confusing, but I couldn’t come up with a better name; stet:

     | { ($bucket): ($count + 1) } as $this

Merge it with the current record:

     | ($current + $this) as $next

Merge that with the user-keyed record:

     | . + { ($user): ($next) }

Then there’s just closing brackets and stuff:

    )' \
    raw.json