Combine data

For combining data we can use the methods concat, merge and join. Sometimes all three methods can be applyed to get the same end result, it depends on your data and your preference.

import pandas as pd
import numpy as np
pd.concat?
pd.merge?
pd.DataFrame.join?

Concat

with concat we can combine data. This is especially handy in combining arrays. By default the concat works with axis=0, e.g the row concatenation. It glues the rows of one dataframe or array to another

df_01 = pd.DataFrame(np.random.randn(3, 5))
df_01

0
1
2
3
4

0

0.113879

1.824252

-1.007082

-0.411709

-0.129588

1

0.571164

-1.167565

-1.462957

-0.573230

-0.974223

2

0.136346

-0.870894

-1.320389

1.000776

2.227670

df_02 = pd.DataFrame(np.random.randn(3,5))
df_02

0
1
2
3
4

0

-0.656365

0.998600

-2.124309

0.574141

-1.108821

1

1.082656

0.051674

-1.077717

1.066916

1.290093

2

0.460137

0.015859

0.216606

1.164776

0.050458

0
1
2
3
4

0

0.113879

1.824252

-1.007082

-0.411709

-0.129588

1

0.571164

-1.167565

-1.462957

-0.573230

-0.974223

2

0.136346

-0.870894

-1.320389

1.000776

2.227670

0

-0.656365

0.998600

-2.124309

0.574141

-1.108821

1

1.082656

0.051674

-1.077717

1.066916

1.290093

2

0.460137

0.015859

0.216606

1.164776

0.050458

if you pass axis=1 it will glue the dataframes in the column direction

0
1
2
3
4
0
1
2
3
4

0

0.113879

1.824252

-1.007082

-0.411709

-0.129588

-0.656365

0.998600

-2.124309

0.574141

-1.108821

1

0.571164

-1.167565

-1.462957

-0.573230

-0.974223

1.082656

0.051674

-1.077717

1.066916

1.290093

2

0.136346

-0.870894

-1.320389

1.000776

2.227670

0.460137

0.015859

0.216606

1.164776

0.050458

In case of unequal shape it will fill the gaps with NaN

value

key

A

1

B

2

C

3

value

key

A

3

B

4

X

5

Y

7

Merge

You can also perform a SQL-style join using the .merge() function:

key
left_value

0

A

1

1

B

2

2

C

3

key
right_value

0

A

3

1

B

4

2

D

51,3

key
left_value
right_value

0

A

1

3

1

B

2

4

key
left_value
right_value

0

A

1.0

3

1

B

2.0

4

2

C

3.0

NaN

3

D

NaN

51,3

key
left_value
right_value

0

A

1.0

3

1

B

2.0

4

2

D

NaN

51,3

key
left_value
right_value

0

A

1

3

1

B

2

4

2

C

3

NaN

key
left_value
other_key

0

A

1

X

1

B

2

Y

2

C

3

Z

key
right_value
some_key

0

A

3

W

1

B

53,2

Y

2

D

5

Z

key
left_value
other_key
right_value
some_key

0

B

2

Y

53,2

Y

key
left_value
other_key
right_value
some_key

0

A

1

X

3

W

1

B

2

Y

53,2

Y

2

C

3

Z

NaN

NaN

Join

Pandas DataFrame has als a join function for merging by index. However overlapping columns cannot exist.

key
left_value
other_key

0

A

1

X

1

B

2

Y

2

C

3

Z

key
right_value
some_key

0

A

3

W

1

B

53,2

Y

2

D

5

Z

left_value
other_key
right_value
some_key

key

A

1.0

X

3

W

B

2.0

Y

53,2

Y

C

3.0

Z

NaN

NaN

D

NaN

NaN

5

Z

name
right_value
some_key

0

A

3

W

1

B

53,2

Y

2

D

5

Z

key
left_value
other_key
name
right_value
some_key

0

A

1

X

A

3

W

1

B

2

Y

B

53,2

Y

2

C

3

Z

D

5

Z

With the on= argument you can match indexes with keys. For example:

key
value

0

a

0

1

b

1

2

a

2

3

a

3

4

b

4

5

c

5

group_val

a

3.5

b

7.0

key
value
group_val

0

a

0

3.5

1

b

1

7.0

2

a

2

3.5

3

a

3

3.5

4

b

4

7.0

5

c

5

NaN

Merging on indexes with merge is also possible

value

key

a

0

b

1

a

2

a

3

b

4

c

5

value
group_val

a

0

3.5

a

2

3.5

a

3

3.5

b

1

7.0

b

4

7.0

c

5

NaN

Last updated

Was this helpful?