Comparing two excel files with Python based on changes

Name	Description	Amount
123	Description123	123
456	Description456	456
789	Description789	666
101	Description777	101
133	Description133	133

Name	Description	Amount
456	Description456	456
789	Description789	789
101	Description101	101
123	Description123	123
102	Description102	102

Name	Description	Amount
123	Description123	123
456	Description456	456
789	Description789	789
101	Description101	101
102	Description102	102
133	Description133	133

−0

Here's what I'd do, hope it still helps someone:

import pandas as pd

t1 = [123,456,789,101,133]
t1_descr = ['Description' + str(i) for i in t1]

table1 = pd.DataFrame({'name': t1, 'description': t1_descr, 'amount': [123,456,666,101,133]})

t2 = [456,789,101,123,102]
t2_descr = ['Description' + str(i) for i in t2]

table2 = pd.DataFrame({'name': t2, 'description': t2_descr, 'amount': t2})

df = table1.merge(table2, on=['name'], how='outer', suffixes=('_t1', '_t2'), indicator=True)

-	name	description_t1	amount_t1	description_t2	amount_t2	_merge
0	123	Description123	123.0	Description123	123.0	both
1	456	Description456	456.0	Description456	456.0	both
2	789	Description789	666.0	Description789	789.0	both
3	101	Description101	101.0	Description101	101.0	both
4	133	Description133	133.0	NaN	NaN	left_only
5	102	NaN		NaN	Description102	102.0

# If `name` is on both tables, use table2
df2 = df.copy()
df2.loc[df2._merge=='both', 'description'] = df2.loc[df2._merge=='both', 'description_t2']
df2.loc[df2._merge=='both', 'amount'] = df2.loc[df2._merge=='both', 'amount_t2']
# New rows on table2
df2.loc[df2._merge=='right_only', 'description'] = df2.loc[df2._merge=='right_only', 'description_t2']
df2.loc[df2._merge=='right_only', 'amount'] = df2.loc[df2._merge=='right_only', 'amount_t2']
# If `name` not in table2, use table1
df2.loc[df2._merge=='left_only', 'description'] = df2.loc[df2._merge=='left_only', 'description_t1']
df2.loc[df2._merge=='left_only', 'amount'] = df2.loc[df2._merge=='left_only', 'amount_t1']

df2.drop(columns=['description_t1', 'amount_t1', 'description_t2', 'amount_t2', '_merge'])

	name	description	amount
0	123	Description123	123.0
1	456	Description456	456.0
2	789	Description789	789.0
3	101	Description101	101.0
4	133	Description133	133.0
5	102	Description102	102.0

posted over 3 years ago

CC BY-SA 4.0

ibmx‭

11 reputation 0 1 1 0

Copy Link

Raw

Markdown

History

Communities

Comparing two excel files with Python based on changes

2 comment threads

1 answer

0 comment threads