Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comparing two excel files with Python based on changes

+3
−1

I have two tables:

Table1:

Name Description Amount
123 Description123 123
456 Description456 456
789 Description789 666
101 Description777 101
133 Description133 133

Table2:

Name Description Amount
456 Description456 456
789 Description789 789
101 Description101 101
123 Description123 123
102 Description102 102

I need to find the difference in Table1 compared it from Table2. The connection between these 2 excel files will be the column Name. Expected output is if something is changed in Table 2 the data must be used from Table 2 and if there is new rows from Table 2 they must be added to the final result. If nothing is also changed or Table 2 doesn't have any data for specific Name from Table 1 like 133 the rows also need to be added to the final result.

Expected output:

Name Description Amount
123 Description123 123
456 Description456 456
789 Description789 789
101 Description101 101
102 Description102 102
133 Description133 133

Thanks in advance!

Edit1: I struggle to find the solution. I understand how to compare each rows in the excel files, but they need to have exactly the same order in Name column. I don't know how to do it if there is no order like this specific case above.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Thanks for the proposal Alexei, I will try it! Also I think from here I can try: https://pandas.pyda... (1 comment)
Use some sort of maps or dictionaries (1 comment)

1 answer

+1
−0

Here's what I'd do, hope it still helps someone:

import pandas as pd

t1 = [123,456,789,101,133]
t1_descr = ['Description' + str(i) for i in t1]

table1 = pd.DataFrame({'name': t1, 'description': t1_descr, 'amount': [123,456,666,101,133]})

t2 = [456,789,101,123,102]
t2_descr = ['Description' + str(i) for i in t2]

table2 = pd.DataFrame({'name': t2, 'description': t2_descr, 'amount': t2})

df = table1.merge(table2, on=['name'], how='outer', suffixes=('_t1', '_t2'), indicator=True)

- name description_t1 amount_t1 description_t2 amount_t2 _merge
0 123 Description123 123.0 Description123 123.0 both
1 456 Description456 456.0 Description456 456.0 both
2 789 Description789 666.0 Description789 789.0 both
3 101 Description101 101.0 Description101 101.0 both
4 133 Description133 133.0 NaN NaN left_only
5 102 NaN NaN Description102 102.0
# If `name` is on both tables, use table2
df2 = df.copy()
df2.loc[df2._merge=='both', 'description'] = df2.loc[df2._merge=='both', 'description_t2']
df2.loc[df2._merge=='both', 'amount'] = df2.loc[df2._merge=='both', 'amount_t2']
# New rows on table2
df2.loc[df2._merge=='right_only', 'description'] = df2.loc[df2._merge=='right_only', 'description_t2']
df2.loc[df2._merge=='right_only', 'amount'] = df2.loc[df2._merge=='right_only', 'amount_t2']
# If `name` not in table2, use table1
df2.loc[df2._merge=='left_only', 'description'] = df2.loc[df2._merge=='left_only', 'description_t1']
df2.loc[df2._merge=='left_only', 'amount'] = df2.loc[df2._merge=='left_only', 'amount_t1']

df2.drop(columns=['description_t1', 'amount_t1', 'description_t2', 'amount_t2', '_merge'])
name description amount
0 123 Description123 123.0
1 456 Description456 456.0
2 789 Description789 789.0
3 101 Description101 101.0
4 133 Description133 133.0
5 102 Description102 102.0
History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »