Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Comments on Replace leaf arrays with joined strings in a nested structure in jq
Parent
Replace leaf arrays with joined strings in a nested structure in jq
Consider the following arbitrarily-nested JSON as input to a jq filter:
echo '[{"foo": [1, 2]}, {"bar": [{"baz": ["foo", "baz"]}]}]' | jq '.'
My goal is to join leaf arrays into strings:
[{"foo": "12"}, {"bar": [{"baz": "foobaz"}]}]
The following filter produces the above output:
walk(if type == "array" and
all(type != "array" and type != "object")
then join("") else . end)
But if I use this filter on an array-only structure like
[[1, 2], [[3, 4], [[5, 6], [7]]]]
I get:
"1234567"
Instead of the expected
["12", ["34", ["56", "7"]]]
Instead of replacing the leaf arrays with strings, it's reduced the nested structure down to a single string!
If I change my filter to
walk(if type == "array" and
all(type != "array" and type != "object")
then [join("")] else . end)
At least the expected structure is preserved, but now the inner strings have an extra single-element array that shouldn't be there:
[["12"], [["34"], [["56"], ["7"]]]]
It's not feasible to strip off these single-element arrays after the fact, since a second pass won't differentiate a single-element array that's supposed to be there from a temporary artificial one. I suppose I could add some metadata to enable second-pass differentiation, but this seems messy to code and hacky. I feel like there should be a simple, direct solution.
I also tried changing my if
branch to . = join("")
and . |= join("")
, but these give me the same old "1234567"
.
How can I reliably join leaf arrays to strings in an arbitrarily-nested structure in-place, without modifying the structure? Why is walk
flattening arrays but not objects here, anyway?
Post
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
ggorlen | (no comment) | Sep 4, 2023 at 05:53 |
As the walk
documentation describes:
When an array is encountered, f is first applied to its elements and then to the array itself
In other words, walk
is bottom-up. So when you apply your filter to your nested-array input, first you're flattening the innermost arrays into strings. Then the arrays at the next level out are arrays of strings, and they get flattened; and so on up the structure. It's not doing anything to your objects because you wrote the filter to only alter arrays.
You can see the definition of walk
in the jq
repository here; from this it is easy to make a top-down variation that will do what you want, by simply moving the f
from the right side of the pipe to the left.
def walktd(f):
def w:
f |
if type == \"object\"
then map_values(w)
elif type == \"array\" then map(w)
else .
end;
w;
walktd(if type == \"array\" and
all(type != \"array\" and type != \"object\")
then join(\"\") else . end)
Alternatively, if you don't want to define your own function, you can do this with walk
but you need to do a bunch of wrapping and unwrapping, in order to distinguish between a string that was produced by collapsing an array and a string that appeared in the original input:
walk(if type == \"object\"
then {
value: map_values(.value),
atom: false
}
elif type == \"array\"
then {
value: (if all(.atom) then map(.value) | join(\"\") else map(.value) end),
atom: false
}
else {
value: .,
atom: true
}
end).value
0 comment threads