Separate digits of a number in groups with different sizes
I have a list of 11-digit numbers stored in one single column in Excel, and I need to separate the digits according to this pattern: 2-2-1-3-3
.
Example: 00002451018
becomes 00 00 2 451 018
.
How can I do it in Python?
I have tried the following:
If you notice the last column new Value is getting only from the last row. Which is ending with 416. Somehow I want inside of this to update and create a new column to contain for each row the updated pattern.
2 answers
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
sfrow | (no comment) | Nov 8, 2021 at 20:26 |
Before we start, I'd like to be a little bit pedantic regarding 00002451018
being a number.
When we talk about numeric types/values, the zeroes at the beginning are irrelevant: 2
, 02
and 000002
all refer to the number two. The numeric value is 2
, and only the representation - the way they're formatted - differs.
That said, this is invalid Python code:
n = 00002451018
Because it gives this error:
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
Therefore, if we were to be pedantic, 00002451018
is not a number in Python. When we read numbers from a file, we're actually reading characters that can be interpreted as a number. But when such conversion happens (text to number), the leading zeroes are "lost"/"ignored", and all we get is the numeric value 2451018
.
So first we need to know what kind of data you have. If you're seeing this number as 00002451018
in Excel, it can be because the cells are formatted as text, or as numbers with a custom formatting (such as "11-digit length with leading zeroes").
The thing is, when you read those cells values as numbers into your Python code, the actual value will be the integer 2451018
(if the column is text, though, the value will be the string '00002451018'
).
Anyway, as I don't know what you actually have in your spreadsheet, let's assume it's a number (if it's text, the value will be a Python string, which can reuse the code below).
There are two approaches to this problem: extract the digits with maths, or convert the number to string and extract pieces of it with slices.
Doing the math
To get the first two digits of a 11-digit number, we can simply divide it by 10^{9} and discard the decimal places.
Then, to get the next two digits (third and fourth ones), we calculate the remainder of the previous division (the number divided by 10^{9}). The result will be a 9-digit number, so we just divide it by 10^{7} to get its first two digits (which correspond to the third and fourth digits of the original number).
We repeat this process until we get all the pieces we want. At each step, the exponent of the power of 10 decreases by the quantity of digits we want.
Note that the number actually doesn't need to have 11 digits. If it has less, the result of the divisions will be zero, so this algorithm already takes care of the leading zeroes. The code is like that:
def get_slices(n, sizes, n_digits=11):
for size in sizes:
n_digits -= size
# get the division and the remainder
val, n = divmod(n, 10 ** n_digits)
yield f'{val:0{size}}'
n = 2451018
print(' '.join(get_slices(n, [2, 2, 1, 3, 3]))) # 00 00 2 451 018
The get_slices
function takes the number, a list of the sizes and the total number of digits to consider.
The yield
keyword makes it a generator function, which yields one part of the number at a time. I used a f-string to properly format the value to have the correct length. This handles cases when the value is zero, or it has less digits than the required size (such as 12
when the size is 3, so it becomes 012
).
To perform the calculations I used the divmod
built-in, which returns both the result of the division (ignoring the decimal places) and the remainder, all in one step.
Then I use join
to build a single string with all the pieces together.
The result is the string 00 00 2 451 018
.
String and slices
Another approach is to convert the number to a string with the leading zeroes, and then take slices of this string with the desired lengths.
def get_slices(n, sizes, n_digits=11):
# convert to string with the desired number of digits and leading zeroes
s = f'{n:0{n_digits}}'
prev = 0
for size in sizes:
yield s[prev : prev + size]
prev += size
n = 2451018
print(' '.join(get_slices(n, [2, 2, 1, 3, 3]))) # 00 00 2 451 018
I use the slice notation to get pieces of the string, and then join those pieces together using join
.
Of course you could also do it like this:
n = 2451018
s = f'{n:011}'
print(f'{s[0:2]} {s[2:4]} {s[4]} {s[5:8]} {s[8:11]}')
But I think the get_slices
function provides a more flexible way, as you can customize the length and the size of each piece.
Regardless of the solution you choose (math or string slices), you can go further and provide a formatting function that uses the get_slices
function internally. You could also make improvements, such as allowing a different separator:
def get_slices(n, sizes, n_digits=11):
# use one of the solutions above...
def format_number(n, sizes, n_digits=11, separator=' '):
return separator.join(get_slices(n, sizes, n_digits))
n = 2451018
print(format_number(n, [2, 2, 1, 3, 3])) # 00 00 2 451 018
# use a different separator
print(format_number(n, [2, 2, 1, 3, 3], separator='-')) # 00-00-2-451-018
# use different number of digits and sizes
print(format_number(n, [3, 4, 2, 2, 4], n_digits=15)) # 000 0000 02 45 1018
Finally, as I said, if the value is already a string, you can simply skip the first step (convert the number to a string) and use the slice solution.
1 comment thread
I found a way how to fix it after I debug. It was only using the last used row because there was nothing to check that I want to edit only the current row. That's why I decide to use at
for index, row in priceListTest.iterrows():
#print(index,row)
def get_slices(n, sizes, n_digits=11):
for size in sizes:
n_digits -= size
val, n = divmod(n, 10 ** n_digits)
yield f'{val:0{size}}'
n = row['Парт номер']
newVar = (' '.join(get_slices(n, [2, 2, 1, 3, 3])))
priceListTest.at[index,['New Value']] = newVar
print("after changes")
print(priceListTest)
1 comment thread