Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

60%
+1 −0
Q&A How to group a flat list of attributes into a nested lists?

You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list. Something like this: import re pattern = re.comp...

posted 8mo ago by hkotsubo‭  ·  edited 8mo ago by hkotsubo‭

Answer
#3: Post edited by user avatar hkotsubo‭ · 2024-03-14T14:38:53Z (8 months ago)
  • You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
  • Something like this:
  • ```python
  • import re
  • pattern = re.compile(r'attr\d+')
  • # just to simulate a "file"
  • file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
  • ##############################################################
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • # regex pattern match
  • if pattern.search(line):
  • attr, item = line.strip().split(maxsplit=1)
  • # if attr is not in the dictionary, create an empty list for it
  • # add item to attr's list
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • # get all the sub-lists and create a list with them
  • grouped_elements = list(all_attrs.values())
  • print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
  • ```
  • When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
  • By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
  • To get the final list, just take all the dictionary values and convert them to a list.
  • ---
  • As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
  • ```python
  • import re
  • pattern = re.compile(r'(attr\d+) ([^\n]+)')
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • match = pattern.match(line)
  • if match:
  • attr, item = match.group(1, 2)
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • ```
  • Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
  • And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
  • ```python
  • for line in file:
  • if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
  • attr, item = match.group(1, 2)
  • # ... the rest is the same
  • ```
  • Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
  • ---
  • Finally, to get the formatted output, you can use the `json` module:
  • ```python
  • import json
  • print(json.dumps(grouped_elements, indent=2))
  • ```
  • Output:
  • ```json
  • [
  • [
  • "attr1 apple 1",
  • "attr1 banana 2"
  • ],
  • [
  • "attr2 grapes 1",
  • "attr2 oranges 2"
  • ],
  • [
  • "attr3 watermelon 0"
  • ]
  • ]
  • ```
  • But I guess that's beside the point. Once you have the final list, you can format it any way you want.
  • ---
  • # Alternative (considering previous edit)
  • Based on a [previous version](https://software.codidact.com/posts/291046/history#2) of the question, **it suggests that the file has blank lines separating each group of items**. Which means that it'd something like this:
  • ```
  • attr1 item 1
  • attr1 item 2
  • <--- blank line separating attr1 from attr2
  • attr2 item 4
  • attr2 item 5
  • <--- blank line separating attr2 from attr3
  • attr3 item 5
  • ```
  • I'm also assuming (as it wasn't clearly stated in the question) that the attributes are not shuffled - which means that the file has all items related to attr1, then a blank line, then all attr2's items, a blank line, and so on.
  • **If that's the case**, you just need to create a new sublist when a blank line is found:
  • ```python
  • grouped_elements = []
  • current_group = []
  • for line in file:
  • if match := pattern.match(line):
  • attr, item = match.group(1, 2)
  • current_group.append(f'{attr} {item}')
  • else:
  • grouped_elements.append(current_group)
  • current_group = []
  • if current_group: # if the current group is not empty
  • grouped_elements.append(current_group)
  • ```
  • Your code didn't work because when reading the file, you discarded the blank lines, so in the second loop all attributes were considered to be in the same group.
  • Please note that the code above makes all the assumptions previously mentioned (file has blank lines separating each group). If that's not the case, it won't work, and the first approach using the dictionary is the preferred solution.
  • You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
  • Something like this:
  • ```python
  • import re
  • pattern = re.compile(r'attr\d+')
  • # just to simulate a "file"
  • file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
  • ##############################################################
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • # regex pattern match
  • if pattern.search(line):
  • attr, item = line.strip().split(maxsplit=1)
  • # if attr is not in the dictionary, create an empty list for it
  • # add item to attr's list
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • # get all the sub-lists and create a list with them
  • grouped_elements = list(all_attrs.values())
  • print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
  • ```
  • When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
  • By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
  • To get the final list, just take all the dictionary values and convert them to a list.
  • ---
  • As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
  • ```python
  • import re
  • pattern = re.compile(r'(attr\d+) ([^\n]+)')
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • match = pattern.match(line)
  • if match:
  • attr, item = match.group(1, 2)
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • ```
  • Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
  • And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
  • ```python
  • for line in file:
  • if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
  • attr, item = match.group(1, 2)
  • # ... the rest is the same
  • ```
  • Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
  • ---
  • Finally, to get the formatted output, you can use the `json` module:
  • ```python
  • import json
  • print(json.dumps(grouped_elements, indent=2))
  • ```
  • Output:
  • ```json
  • [
  • [
  • "attr1 apple 1",
  • "attr1 banana 2"
  • ],
  • [
  • "attr2 grapes 1",
  • "attr2 oranges 2"
  • ],
  • [
  • "attr3 watermelon 0"
  • ]
  • ]
  • ```
  • But I guess that's beside the point. Once you have the final list, you can format it any way you want.
  • ---
  • # Alternative (considering previous edit)
  • Based on a [previous version](https://software.codidact.com/posts/291046/history#2) of the question, **it suggests that the file has blank lines separating each group of items**. Which means that it'd something like this:
  • ```
  • attr1 item 1
  • attr1 item 2
  • <--- blank line separating attr1 from attr2
  • attr2 item 4
  • attr2 item 5
  • <--- blank line separating attr2 from attr3
  • attr3 item 5
  • ```
  • I'm also assuming (as it wasn't clearly stated in the question) that the attributes are not shuffled - which means that the file has all items related to attr1, then a blank line, then all attr2's items, a blank line, and so on.
  • **If that's the case**, you just need to create a new sublist when a blank line is found:
  • ```python
  • import re
  • pattern = re.compile(r'(attr\d+) ([^\n]+)')
  • grouped_elements = []
  • current_group = []
  • for line in file:
  • if match := pattern.match(line):
  • attr, item = match.group(1, 2)
  • current_group.append(f'{attr} {item}')
  • else:
  • grouped_elements.append(current_group)
  • current_group = []
  • if current_group: # if the current group is not empty
  • grouped_elements.append(current_group)
  • ```
  • Your code didn't work because when reading the file, you discarded the blank lines, so in the second loop all attributes were considered to be in the same group.
  • Please note that the code above makes all the assumptions previously mentioned (file has blank lines separating each group). If that's not the case, it won't work, and the first approach using the dictionary is the preferred solution.
#2: Post edited by user avatar hkotsubo‭ · 2024-03-14T14:15:34Z (8 months ago)
  • You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
  • Something like this:
  • ```python
  • import re
  • pattern = re.compile(r'attr\d+')
  • # just to simulate a "file"
  • file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
  • ##############################################################
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • # regex pattern match
  • if pattern.search(line):
  • attr, item = line.strip().split(maxsplit=1)
  • # if attr is not in the dictionary, create an empty list for it
  • # add item to attr's list
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • # get all the sub-lists and create a list with them
  • grouped_elements = list(all_attrs.values())
  • print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
  • ```
  • When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
  • By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
  • To get the final list, just take all the dictionary values and convert them to a list.
  • ---
  • As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
  • ```python
  • import re
  • pattern = re.compile(r'(attr\d+) ([^\n]+)')
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • match = pattern.match(line)
  • if match:
  • attr, item = match.group(1, 2)
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • ```
  • Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
  • And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
  • ```python
  • for line in file:
  • if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
  • attr, item = match.group(1, 2)
  • # ... the rest is the same
  • ```
  • Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
  • ---
  • Finally, to get the formatted output, you can use the `json` module:
  • ```python
  • import json
  • print(json.dumps(grouped_elements, indent=2))
  • ```
  • Output:
  • ```json
  • [
  • [
  • "attr1 apple 1",
  • "attr1 banana 2"
  • ],
  • [
  • "attr2 grapes 1",
  • "attr2 oranges 2"
  • ],
  • [
  • "attr3 watermelon 0"
  • ]
  • ]
  • ```
  • But I guess that's beside the point. Once you have the final list, you can format it any way you want.
  • You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
  • Something like this:
  • ```python
  • import re
  • pattern = re.compile(r'attr\d+')
  • # just to simulate a "file"
  • file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
  • ##############################################################
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • # regex pattern match
  • if pattern.search(line):
  • attr, item = line.strip().split(maxsplit=1)
  • # if attr is not in the dictionary, create an empty list for it
  • # add item to attr's list
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • # get all the sub-lists and create a list with them
  • grouped_elements = list(all_attrs.values())
  • print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
  • ```
  • When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
  • By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
  • To get the final list, just take all the dictionary values and convert them to a list.
  • ---
  • As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
  • ```python
  • import re
  • pattern = re.compile(r'(attr\d+) ([^\n]+)')
  • all_attrs = {} # dictionary to map each attribute to its items
  • for line in file:
  • match = pattern.match(line)
  • if match:
  • attr, item = match.group(1, 2)
  • all_attrs.setdefault(attr, []).append(f'{attr} {item}')
  • ```
  • Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
  • And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
  • ```python
  • for line in file:
  • if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
  • attr, item = match.group(1, 2)
  • # ... the rest is the same
  • ```
  • Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
  • ---
  • Finally, to get the formatted output, you can use the `json` module:
  • ```python
  • import json
  • print(json.dumps(grouped_elements, indent=2))
  • ```
  • Output:
  • ```json
  • [
  • [
  • "attr1 apple 1",
  • "attr1 banana 2"
  • ],
  • [
  • "attr2 grapes 1",
  • "attr2 oranges 2"
  • ],
  • [
  • "attr3 watermelon 0"
  • ]
  • ]
  • ```
  • But I guess that's beside the point. Once you have the final list, you can format it any way you want.
  • ---
  • # Alternative (considering previous edit)
  • Based on a [previous version](https://software.codidact.com/posts/291046/history#2) of the question, **it suggests that the file has blank lines separating each group of items**. Which means that it'd something like this:
  • ```
  • attr1 item 1
  • attr1 item 2
  • <--- blank line separating attr1 from attr2
  • attr2 item 4
  • attr2 item 5
  • <--- blank line separating attr2 from attr3
  • attr3 item 5
  • ```
  • I'm also assuming (as it wasn't clearly stated in the question) that the attributes are not shuffled - which means that the file has all items related to attr1, then a blank line, then all attr2's items, a blank line, and so on.
  • **If that's the case**, you just need to create a new sublist when a blank line is found:
  • ```python
  • grouped_elements = []
  • current_group = []
  • for line in file:
  • if match := pattern.match(line):
  • attr, item = match.group(1, 2)
  • current_group.append(f'{attr} {item}')
  • else:
  • grouped_elements.append(current_group)
  • current_group = []
  • if current_group: # if the current group is not empty
  • grouped_elements.append(current_group)
  • ```
  • Your code didn't work because when reading the file, you discarded the blank lines, so in the second loop all attributes were considered to be in the same group.
  • Please note that the code above makes all the assumptions previously mentioned (file has blank lines separating each group). If that's not the case, it won't work, and the first approach using the dictionary is the preferred solution.
#1: Initial revision by user avatar hkotsubo‭ · 2024-03-14T12:40:52Z (8 months ago)
You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.

Something like this:

```python
import re
pattern = re.compile(r'attr\d+')

# just to simulate a "file"
file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]

##############################################################
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
    # regex pattern match
    if pattern.search(line):
        attr, item = line.strip().split(maxsplit=1)
        # if attr is not in the dictionary, create an empty list for it
        # add item to attr's list
        all_attrs.setdefault(attr, []).append(f'{attr} {item}')

# get all the sub-lists and create a list with them
grouped_elements = list(all_attrs.values())
print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
```

When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.

By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.

To get the final list, just take all the dictionary values and convert them to a list.

---

As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:

```python
import re
pattern = re.compile(r'(attr\d+) ([^\n]+)')

all_attrs = {} # dictionary to map each attribute to its items
for line in file:
    match = pattern.match(line)
    if match:
        attr, item = match.group(1, 2)
        all_attrs.setdefault(attr, []).append(f'{attr} {item}')
```

Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).

And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):

```python
for line in file:
    if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
        attr, item = match.group(1, 2)
        # ... the rest is the same
```

Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".

---

Finally, to get the formatted output, you can use the `json` module:

```python
import json
print(json.dumps(grouped_elements, indent=2))
```

Output:

```json
[
  [
    "attr1 apple 1",
    "attr1 banana 2"
  ],
  [
    "attr2 grapes 1",
    "attr2 oranges 2"
  ],
  [
    "attr3 watermelon 0"
  ]
]
```

But I guess that's beside the point. Once you have the final list, you can format it any way you want.