Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs

Dashboard
Notifications
Mark all as read
Q&A

Separate digits of a number in groups with different sizes

+4
−0

I have a list of 11-digit numbers stored in one single column in Excel, and I need to separate the digits according to this pattern: 2-2-1-3-3.

Example: 00002451018 becomes 00 00 2 451 018.

How can I do it in Python?

I have tried the following:

Python implementation

If you notice the last column new Value is getting only from the last row. Which is ending with 416. Somehow I want inside of this to update and create a new column to contain for each row the updated pattern.

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Provide code, not a picture (1 comment)

2 answers

+6
−0

Before we start, I'd like to be a little bit pedantic regarding 00002451018 being a number.

When we talk about numeric types/values, the zeroes at the beginning are irrelevant: 2, 02 and 000002 all refer to the number two. The numeric value is 2, and only the representation - the way they're formatted - differs.

That said, this is invalid Python code:

n = 00002451018

Because it gives this error:

SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

Therefore, if we were to be pedantic, 00002451018 is not a number in Python. When we read numbers from a file, we're actually reading characters that can be interpreted as a number. But when such conversion happens (text to number), the leading zeroes are "lost"/"ignored", and all we get is the numeric value 2451018.

So first we need to know what kind of data you have. If you're seeing this number as 00002451018 in Excel, it can be because the cells are formatted as text, or as numbers with a custom formatting (such as "11-digit length with leading zeroes").

The thing is, when you read those cells values as numbers into your Python code, the actual value will be the integer 2451018 (if the column is text, though, the value will be the string '00002451018').

Anyway, as I don't know what you actually have in your spreadsheet, let's assume it's a number (if it's text, the value will be a Python string, which can reuse the code below).

There are two approaches to this problem: extract the digits with maths, or convert the number to string and extract pieces of it with slices.


Doing the math

To get the first two digits of a 11-digit number, we can simply divide it by 109 and discard the decimal places.

Then, to get the next two digits (third and fourth ones), we calculate the remainder of the previous division (the number divided by 109). The result will be a 9-digit number, so we just divide it by 107 to get its first two digits (which correspond to the third and fourth digits of the original number).

We repeat this process until we get all the pieces we want. At each step, the exponent of the power of 10 decreases by the quantity of digits we want.

Note that the number actually doesn't need to have 11 digits. If it has less, the result of the divisions will be zero, so this algorithm already takes care of the leading zeroes. The code is like that:

def get_slices(n, sizes, n_digits=11):
    for size in sizes:
        n_digits -= size
        # get the division and the remainder
        val, n = divmod(n, 10 ** n_digits)
        yield f'{val:0{size}}'

n = 2451018
print(' '.join(get_slices(n, [2, 2, 1, 3, 3]))) # 00 00 2 451 018

The get_slices function takes the number, a list of the sizes and the total number of digits to consider.

The yield keyword makes it a generator function, which yields one part of the number at a time. I used a f-string to properly format the value to have the correct length. This handles cases when the value is zero, or it has less digits than the required size (such as 12 when the size is 3, so it becomes 012).

To perform the calculations I used the divmod built-in, which returns both the result of the division (ignoring the decimal places) and the remainder, all in one step.

Then I use join to build a single string with all the pieces together.
The result is the string 00 00 2 451 018.

String and slices

Another approach is to convert the number to a string with the leading zeroes, and then take slices of this string with the desired lengths.

def get_slices(n, sizes, n_digits=11):
    # convert to string with the desired number of digits and leading zeroes
    s = f'{n:0{n_digits}}'
    prev = 0
    for size in sizes:
        yield s[prev : prev + size]
        prev += size

n = 2451018
print(' '.join(get_slices(n, [2, 2, 1, 3, 3]))) # 00 00 2 451 018

I use the slice notation to get pieces of the string, and then join those pieces together using join.

Of course you could also do it like this:

n = 2451018
s = f'{n:011}'
print(f'{s[0:2]} {s[2:4]} {s[4]} {s[5:8]} {s[8:11]}')

But I think the get_slices function provides a more flexible way, as you can customize the length and the size of each piece.


Regardless of the solution you choose (math or string slices), you can go further and provide a formatting function that uses the get_slices function internally. You could also make improvements, such as allowing a different separator:

def get_slices(n, sizes, n_digits=11):
    # use one of the solutions above...

def format_number(n, sizes, n_digits=11, separator=' '):
    return separator.join(get_slices(n, sizes, n_digits))

n = 2451018
print(format_number(n, [2, 2, 1, 3, 3])) # 00 00 2 451 018

# use a different separator
print(format_number(n, [2, 2, 1, 3, 3], separator='-')) # 00-00-2-451-018

# use different number of digits and sizes
print(format_number(n, [3, 4, 2, 2, 4], n_digits=15)) # 000 0000 02 45 1018

Finally, as I said, if the value is already a string, you can simply skip the first step (convert the number to a string) and use the slice solution.

Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

Your answer is amazing. I will spend time to check everything that you comment and explain. Thank you... (1 comment)
+0
−1

I found a way how to fix it after I debug. It was only using the last used row because there was nothing to check that I want to edit only the current row. That's why I decide to use at

for index, row in priceListTest.iterrows(): 
    #print(index,row)
    def get_slices(n, sizes, n_digits=11):
        for size in sizes:
            n_digits -= size
            
            val, n = divmod(n, 10 ** n_digits)
            yield f'{val:0{size}}'

    n = row['Парт номер']
    newVar = (' '.join(get_slices(n, [2, 2, 1, 3, 3])))
    priceListTest.at[index,['New Value']] = newVar

print("after changes")
print(priceListTest)
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »