Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

60%

+1 −0

Q&A How to group a flat list of attributes into a nested lists?

You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list. Something like this: import re pattern = re.comp...

posted 1y ago by hkotsubo‭ · edited 1y ago by hkotsubo‭

Answer

#3: Post edited by

hkotsubo‭ · 2024-03-14T14:38:53Z (about 1 year ago)

Copy Link

Raw

Markdown

You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
Something like this:
```python
import re
pattern = re.compile(r'attr\d+')
# just to simulate a "file"
file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
##############################################################
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
# regex pattern match
if pattern.search(line):
attr, item = line.strip().split(maxsplit=1)
# if attr is not in the dictionary, create an empty list for it
# add item to attr's list
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
# get all the sub-lists and create a list with them
grouped_elements = list(all_attrs.values())
print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
```
When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
To get the final list, just take all the dictionary values and convert them to a list.
---
As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
```python
import re
pattern = re.compile(r'(attr\d+) ([^\n]+)')
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
match = pattern.match(line)
if match:
attr, item = match.group(1, 2)
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
```
Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
```python
for line in file:
if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
attr, item = match.group(1, 2)
# ... the rest is the same
```
Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
---
Finally, to get the formatted output, you can use the `json` module:
```python
import json
print(json.dumps(grouped_elements, indent=2))
```
Output:
```json
[
[
"attr1 apple 1",
"attr1 banana 2"
],
[
"attr2 grapes 1",
"attr2 oranges 2"
],
[
"attr3 watermelon 0"
]
]
```
But I guess that's beside the point. Once you have the final list, you can format it any way you want.
---
# Alternative (considering previous edit)
Based on a [previous version](https://software.codidact.com/posts/291046/history#2) of the question, **it suggests that the file has blank lines separating each group of items**. Which means that it'd something like this:
```
attr1 item 1
attr1 item 2
<--- blank line separating attr1 from attr2
attr2 item 4
attr2 item 5
<--- blank line separating attr2 from attr3
attr3 item 5
```
I'm also assuming (as it wasn't clearly stated in the question) that the attributes are not shuffled - which means that the file has all items related to attr1, then a blank line, then all attr2's items, a blank line, and so on.
**If that's the case**, you just need to create a new sublist when a blank line is found:
```python
grouped_elements = []
current_group = []
for line in file:
if match := pattern.match(line):
attr, item = match.group(1, 2)
current_group.append(f'{attr} {item}')
else:
grouped_elements.append(current_group)
current_group = []
if current_group: # if the current group is not empty
grouped_elements.append(current_group)
```
Your code didn't work because when reading the file, you discarded the blank lines, so in the second loop all attributes were considered to be in the same group.
Please note that the code above makes all the assumptions previously mentioned (file has blank lines separating each group). If that's not the case, it won't work, and the first approach using the dictionary is the preferred solution.

You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
Something like this:
```python
import re
pattern = re.compile(r'attr\d+')
# just to simulate a "file"
file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
##############################################################
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
# regex pattern match
if pattern.search(line):
attr, item = line.strip().split(maxsplit=1)
# if attr is not in the dictionary, create an empty list for it
# add item to attr's list
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
# get all the sub-lists and create a list with them
grouped_elements = list(all_attrs.values())
print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
```
When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
To get the final list, just take all the dictionary values and convert them to a list.
---
As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
```python
import re
pattern = re.compile(r'(attr\d+) ([^\n]+)')
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
match = pattern.match(line)
if match:
attr, item = match.group(1, 2)
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
```
Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
```python
for line in file:
if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
attr, item = match.group(1, 2)
# ... the rest is the same
```
Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
---
Finally, to get the formatted output, you can use the `json` module:
```python
import json
print(json.dumps(grouped_elements, indent=2))
```
Output:
```json
[
[
"attr1 apple 1",
"attr1 banana 2"
],
[
"attr2 grapes 1",
"attr2 oranges 2"
],
[
"attr3 watermelon 0"
]
]
```
But I guess that's beside the point. Once you have the final list, you can format it any way you want.
---
# Alternative (considering previous edit)
Based on a [previous version](https://software.codidact.com/posts/291046/history#2) of the question, **it suggests that the file has blank lines separating each group of items**. Which means that it'd something like this:
```
attr1 item 1
attr1 item 2
<--- blank line separating attr1 from attr2
attr2 item 4
attr2 item 5
<--- blank line separating attr2 from attr3
attr3 item 5
```
I'm also assuming (as it wasn't clearly stated in the question) that the attributes are not shuffled - which means that the file has all items related to attr1, then a blank line, then all attr2's items, a blank line, and so on.
**If that's the case**, you just need to create a new sublist when a blank line is found:
```python
import re
pattern = re.compile(r'(attr\d+) ([^\n]+)')
grouped_elements = []
current_group = []
for line in file:
if match := pattern.match(line):
attr, item = match.group(1, 2)
current_group.append(f'{attr} {item}')
else:
grouped_elements.append(current_group)
current_group = []
if current_group: # if the current group is not empty
grouped_elements.append(current_group)
```
Your code didn't work because when reading the file, you discarded the blank lines, so in the second loop all attributes were considered to be in the same group.
Please note that the code above makes all the assumptions previously mentioned (file has blank lines separating each group). If that's not the case, it won't work, and the first approach using the dictionary is the preferred solution.

#2: Post edited by

hkotsubo‭ · 2024-03-14T14:15:34Z (about 1 year ago)

Copy Link

Raw

Markdown

You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
Something like this:
```python
import re
pattern = re.compile(r'attr\d+')
# just to simulate a "file"
file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
##############################################################
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
# regex pattern match
if pattern.search(line):
attr, item = line.strip().split(maxsplit=1)
# if attr is not in the dictionary, create an empty list for it
# add item to attr's list
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
# get all the sub-lists and create a list with them
grouped_elements = list(all_attrs.values())
print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
```
When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
To get the final list, just take all the dictionary values and convert them to a list.
---
As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
```python
import re
pattern = re.compile(r'(attr\d+) ([^\n]+)')
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
match = pattern.match(line)
if match:
attr, item = match.group(1, 2)
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
```
Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
```python
for line in file:
if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
attr, item = match.group(1, 2)
# ... the rest is the same
```
Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
---
Finally, to get the formatted output, you can use the `json` module:
```python
import json
print(json.dumps(grouped_elements, indent=2))
```
Output:
```json
[
[
"attr1 apple 1",
"attr1 banana 2"
],
[
"attr2 grapes 1",
"attr2 oranges 2"
],
[
"attr3 watermelon 0"
]
]
```
~~But I guess that's beside the point. Once you have the final list, you can format it any way you want.~~

You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.
Something like this:
```python
import re
pattern = re.compile(r'attr\d+')
# just to simulate a "file"
file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]
##############################################################
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
# regex pattern match
if pattern.search(line):
attr, item = line.strip().split(maxsplit=1)
# if attr is not in the dictionary, create an empty list for it
# add item to attr's list
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
# get all the sub-lists and create a list with them
grouped_elements = list(all_attrs.values())
print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
```
When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.
By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.
To get the final list, just take all the dictionary values and convert them to a list.
---
As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:
```python
import re
pattern = re.compile(r'(attr\d+) ([^\n]+)')
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
match = pattern.match(line)
if match:
attr, item = match.group(1, 2)
all_attrs.setdefault(attr, []).append(f'{attr} {item}')
```
Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).
And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):
```python
for line in file:
if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
attr, item = match.group(1, 2)
# ... the rest is the same
```
Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".
---
Finally, to get the formatted output, you can use the `json` module:
```python
import json
print(json.dumps(grouped_elements, indent=2))
```
Output:
```json
[
[
"attr1 apple 1",
"attr1 banana 2"
],
[
"attr2 grapes 1",
"attr2 oranges 2"
],
[
"attr3 watermelon 0"
]
]
```
But I guess that's beside the point. Once you have the final list, you can format it any way you want.
---
# Alternative (considering previous edit)
Based on a [previous version](https://software.codidact.com/posts/291046/history#2) of the question, **it suggests that the file has blank lines separating each group of items**. Which means that it'd something like this:
```
attr1 item 1
attr1 item 2
<--- blank line separating attr1 from attr2
attr2 item 4
attr2 item 5
<--- blank line separating attr2 from attr3
attr3 item 5
```
I'm also assuming (as it wasn't clearly stated in the question) that the attributes are not shuffled - which means that the file has all items related to attr1, then a blank line, then all attr2's items, a blank line, and so on.
**If that's the case**, you just need to create a new sublist when a blank line is found:
```python
grouped_elements = []
current_group = []
for line in file:
if match := pattern.match(line):
attr, item = match.group(1, 2)
current_group.append(f'{attr} {item}')
else:
grouped_elements.append(current_group)
current_group = []
if current_group: # if the current group is not empty
grouped_elements.append(current_group)
```
Your code didn't work because when reading the file, you discarded the blank lines, so in the second loop all attributes were considered to be in the same group.
Please note that the code above makes all the assumptions previously mentioned (file has blank lines separating each group). If that's not the case, it won't work, and the first approach using the dictionary is the preferred solution.

#1: Initial revision by

hkotsubo‭ · 2024-03-14T12:40:52Z (about 1 year ago)

Copy Link

Raw

Markdown

You could create a dictionary to map each attribute to its respective list of items. Then you get the dictionary values to create the final list.

Something like this:

```python
import re
pattern = re.compile(r'attr\d+')

# just to simulate a "file"
file = [ 'attr1 apple 1', 'attr1 banana 2', 'attr2 grapes 1', 'attr2 oranges 2', 'attr3 watermelon 0' ]

##############################################################
all_attrs = {} # dictionary to map each attribute to its items
for line in file:
    # regex pattern match
    if pattern.search(line):
        attr, item = line.strip().split(maxsplit=1)
        # if attr is not in the dictionary, create an empty list for it
        # add item to attr's list
        all_attrs.setdefault(attr, []).append(f'{attr} {item}')

# get all the sub-lists and create a list with them
grouped_elements = list(all_attrs.values())
print(grouped_elements) # [['attr1 apple 1', 'attr1 banana 2'], ['attr2 grapes 1', 'attr2 oranges 2'], ['attr3 watermelon 0']]
```

When reading the input, you map each attribute to a list. `setdefault(attr, [])` creates a new list if the attribute is not in the dictionary yet, otherwise it returns the existing list. Then I add the current string ("attribute + item name") to this list.

By the end, the dictionary will have all attributes as keys ("attr1", "attr2", etc), and their respective values will be the lists with the strings associated with that attribute - so "attr1" key will have the list `['attr1 apple 1', 'attr1 banana 2']` as value, and so on.

To get the final list, just take all the dictionary values and convert them to a list.

---

As a side note, you can also use the regex to extract the attribute and item names directly, instead of spliting the string:

```python
import re
pattern = re.compile(r'(attr\d+) ([^\n]+)')

all_attrs = {} # dictionary to map each attribute to its items
for line in file:
    match = pattern.match(line)
    if match:
        attr, item = match.group(1, 2)
        all_attrs.setdefault(attr, []).append(f'{attr} {item}')
```

Now the regex has two [capturing groups](https://www.regular-expressions.info/brackets.html) (each pair of parenthesis is a group): the first one has the attribute name, and the second one has the rest of the string, except for the new line at the end (thus eliminating the need to call `strip()`).

And if you're using Python >= 3.8, you can use an [Assignment Expression](https://peps.python.org/pep-0572/):

```python
for line in file:
    if match := pattern.match(line): # assignment expression: assigns "match" and test it at the same line
        attr, item = match.group(1, 2)
        # ... the rest is the same
```

Of course you can change the regex to match a specific pattern (such as "items must have only letters or numbers", etc). But the exact format wasn't specified, so I'm assuming it's just "everything after the attribute name".

---

Finally, to get the formatted output, you can use the `json` module:

```python
import json
print(json.dumps(grouped_elements, indent=2))
```

Output:

```json
[
  [
    "attr1 apple 1",
    "attr1 banana 2"
  ],
  [
    "attr2 grapes 1",
    "attr2 oranges 2"
  ],
  [
    "attr3 watermelon 0"
  ]
]
```

But I guess that's beside the point. Once you have the final list, you can format it any way you want.

Communities

Post History