Combine data
Concat, merge or join
For combining data we can use the methods concat
, merge
and join
. Sometimes all three methods can be applied to get the same end result, it depends on your data and your preference
Concat
with concat we can combine data. This is especcially handy in combining arrays. By default, the concat works with axis=0
, e.g the row concatenation. It glues the rows of one dataframe or array to another
0
1
2
3
4
0
0.799711
0.546198
-1.499229
0.030387
2.282779
1
0.341611
1.417359
-2.053733
0.687576
-0.275441
2
-0.603197
-0.981207
-1.839125
-1.083722
-1.129557
0
1
2
3
4
0
-1.058341
2.681832
0.752952
-0.348513
-0.289674
1
1.101420
1.136794
-0.515453
-0.106401
1.208821
2
-0.295428
1.292897
-0.584524
-2.071780
-1.573926
0
1
2
3
4
0
0.799711
0.546198
-1.499229
0.030387
2.282779
1
0.341611
1.417359
-2.053733
0.687576
-0.275441
2
-0.603197
-0.981207
-1.839125
-1.083722
-1.129557
0
-1.058341
2.681832
0.752952
-0.348513
-0.289674
1
1.101420
1.136794
-0.515453
-0.106401
1.208821
2
-0.295428
1.292897
-0.584524
-2.071780
-1.573926
if you pass axis=1
it will glue the dataframes in the column direction
0
1
2
3
4
0
1
2
3
4
0
0.799711
0.546198
-1.499229
0.030387
2.282779
-1.058341
2.681832
0.752952
-0.348513
-0.289674
1
0.341611
1.417359
-2.053733
0.687576
-0.275441
1.101420
1.136794
-0.515453
-0.106401
1.208821
2
-0.603197
-0.981207
-1.839125
-1.083722
-1.129557
-0.295428
1.292897
-0.584524
-2.071780
-1.573926
Merge
In the JSON study case we saw that we had to combine data sources. Before we look into the ways how we merge data we should consider the difference between outer, inner, left and right joins. The following picture shows the differences:
You can also perform a SQL-style join using the .merge() function:
key
left_value
0
A
1
1
B
2
2
C
3
key
right_value
0
A
3
1
B
4
2
D
51,3
key
left_value
right_value
0
A
1
3
1
B
2
4
key
left_value
right_value
0
A
1.0
3
1
B
2.0
4
2
C
3.0
NaN
3
D
NaN
51,3
key
left_value
right_value
0
A
1.0
3
1
B
2.0
4
2
D
NaN
51,3
key
left_value
right_value
0
A
1
3
1
B
2
4
2
C
3
NaN
key
left_value
other_key
0
A
1
X
1
B
2
Y
2
C
3
Z
key
right_value
some_key
0
A
3
W
1
B
53,2
Y
2
D
5
Z
key
left_value
other_key
right_value
some_key
0
B
2
Y
53,2
Y
key
left_value
other_key
right_value
some_key
0
A
1
X
3
W
1
B
2
Y
53,2
Y
2
C
3
Z
NaN
NaN
Join
Pandas DataFrame has also a join function for merging by index. However overlapping columns cannot exist.
key
left_value
other_key
0
A
1
X
1
B
2
Y
2
C
3
Z
key
right_value
some_key
0
A
3
W
1
B
53,2
Y
2
D
5
Z
left_value
other_key
right_value
some_key
key
A
1.0
X
3
W
B
2.0
Y
53,2
Y
C
3.0
Z
NaN
NaN
D
NaN
NaN
5
Z
name
right_value
some_key
0
A
3
W
1
B
53,2
Y
2
D
5
Z
key
left_value
other_key
name
right_value
some_key
0
A
1
X
A
3
W
1
B
2
Y
B
53,2
Y
2
C
3
Z
D
5
Z
With the on=
argument you can match indexes with keys. For example:
key
value
0
a
0
1
b
1
2
a
2
3
a
3
4
b
4
5
c
5
group_val
a
3.5
b
7.0
key
value
group_val
0
a
0
3.5
1
b
1
7.0
2
a
2
3.5
3
a
3
3.5
4
b
4
7.0
5
c
5
NaN
Merging on indexes with merge
is also possible
value
key
a
0
b
1
a
2
a
3
b
4
c
5
value
group_val
a
0
3.5
a
2
3.5
a
3
3.5
b
1
7.0
b
4
7.0
c
5
NaN
Last updated
Was this helpful?