Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Replace leaf arrays with joined strings in a nested structure in jq

Parent

Replace leaf arrays with joined strings in a nested structure in jq

+4
−0

Consider the following arbitrarily-nested JSON as input to a jq filter:

echo '[{"foo": [1, 2]}, {"bar": [{"baz": ["foo", "baz"]}]}]' | jq '.'

My goal is to join leaf arrays into strings:

[{"foo": "12"}, {"bar": [{"baz": "foobaz"}]}]

The following filter produces the above output:

walk(if type == "array" and
       all(type != "array" and type != "object")
       then join("") else . end)

But if I use this filter on an array-only structure like

[[1, 2], [[3, 4], [[5, 6], [7]]]]

I get:

"1234567"

Instead of the expected

["12", ["34", ["56", "7"]]]

Instead of replacing the leaf arrays with strings, it's reduced the nested structure down to a single string!

If I change my filter to

walk(if type == "array" and
       all(type != "array" and type != "object")
       then [join("")] else . end)

At least the expected structure is preserved, but now the inner strings have an extra single-element array that shouldn't be there:

[["12"], [["34"], [["56"], ["7"]]]]

It's not feasible to strip off these single-element arrays after the fact, since a second pass won't differentiate a single-element array that's supposed to be there from a temporary artificial one. I suppose I could add some metadata to enable second-pass differentiation, but this seems messy to code and hacky. I feel like there should be a simple, direct solution.

I also tried changing my if branch to . = join("") and . |= join(""), but these give me the same old "1234567".

How can I reliably join leaf arrays to strings in an arbitrarily-nested structure in-place, without modifying the structure? Why is walk flattening arrays but not objects here, anyway?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+3
−0

As the walk documentation describes:

When an array is encountered, f is first applied to its elements and then to the array itself

In other words, walk is bottom-up. So when you apply your filter to your nested-array input, first you're flattening the innermost arrays into strings. Then the arrays at the next level out are arrays of strings, and they get flattened; and so on up the structure. It's not doing anything to your objects because you wrote the filter to only alter arrays.

You can see the definition of walk in the jq repository here; from this it is easy to make a top-down variation that will do what you want, by simply moving the f from the right side of the pipe to the left.

def walktd(f):
  def w:
    f |
    if type == \"object\"
    then map_values(w)
    elif type == \"array\" then map(w)
    else .
    end;
  w;

walktd(if type == \"array\" and
       all(type != \"array\" and type != \"object\")
       then join(\"\") else . end)

Alternatively, if you don't want to define your own function, you can do this with walk but you need to do a bunch of wrapping and unwrapping, in order to distinguish between a string that was produced by collapsing an array and a string that appeared in the original input:

walk(if type == \"object\"
    then {
        value: map_values(.value),
        atom: false
    }
    elif type == \"array\"
    then {
        value: (if all(.atom) then map(.value) | join(\"\") else map(.value) end),
        atom: false
    }
    else {
        value: .,
        atom: true
    }
    end).value
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Is there a reasonably simple solution without def? (2 comments)
Is there a reasonably simple solution without def?
ggorlen‭ wrote about 1 year ago

Thanks! Your explanation makes sense. I'd accept this as the answer, but I'm still hoping to see a version that works with a fairly minimal adjustment using the builtin walk, without defining my own function. Is there such a straightforward solution, or would it be enough of a hassle that this approach is basically more or less optimal?

r~~‭ wrote about 1 year ago

Maybe not ‘reasonably simple’, but I've added a solution that uses walk without defining a new function. I don't think you'll be able to do better than this, without cutting some corners in terms of what inputs you accept.