Reshape with melt

reshape your data with melt

The pandas.DataFrame.melt() function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars). The function “unpivotes” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

To demonstrate this we will work with the EEG brain dataset. The values are in the X<number> columns, the variable of interest is the 'y'

import numpy as np
import pandas as pd

from bokeh.plotting import figure, output_file, show
df = pd.read_csv('data/eeg_data.csv').rename(columns={"Unnamed: 0": "ID"})
df = df.drop(columns='ID')
df.head()

X10

...

X170

X171

X172

X173

X174

X175

X176

X177

X178

135

190

229

223

192

125

-9

-33

-38

...

-17

-15

-31

-77

-103

-127

-116

-83

-51

386

382

356

331

320

315

307

272

244

232

...

164

150

146

152

157

156

154

143

129

-32

-39

-47

-37

-32

-36

-57

-73

-85

-94

...

-12

-30

-35

-36

-105

-101

-96

-92

-89

-95

-102

-100

-87

-79

...

-82

-81

-80

-77

-85

-77

-72

-69

-65

-9

-65

-98

-102

-78

-48

-16

-21

-59

...

-12

-32

-41

-65

-83

-89

-73

5 rows × 179 columns

We can melt this by the function melt. It will keep the 'y' value and put all the other columns in the variable column

dfm = df.melt(id_vars=['y'])
dfm.head()

variable

value

135

386

-32

-105

-9

This might be handy if I want for instance to groupby y-value to discover the differences in counts, mean or standard deviation. I also can make a graphical overview

dfm.y.value_counts()
dfm.groupby('y').std()
dfm.groupby('y').mean()

import seaborn as sns
sns.violinplot(x=dfm.y, y=dfm.value)

PreviousDateTime wrangling NextCombine data

Last updated 2 years ago

Reshape with melt

reshape your data with melt

To demonstrate this we will work with the EEG brain dataset. The values are in the X<number> columns, the variable of interest is the 'y'

import numpy as np
import pandas as pd

from bokeh.plotting import figure, output_file, show
df = pd.read_csv('data/eeg_data.csv').rename(columns={"Unnamed: 0": "ID"})
df = df.drop(columns='ID')
df.head()

X10

...

X170

X171

X172

X173

X174

X175

X176

X177

X178

135

190

229

223

192

125

-9

-33

-38

...

-17

-15

-31

-77

-103

-127

-116

-83

-51

386

382

356

331

320

315

307

272

244

232

...

164

150

146

152

157

156

154

143

129

-32

-39

-47

-37

-32

-36

-57

-73

-85

-94

...

-12

-30

-35

-36

-105

-101

-96

-92

-89

-95

-102

-100

-87

-79

...

-82

-81

-80

-77

-85

-77

-72

-69

-65

-9

-65

-98

-102

-78

-48

-16

-21

-59

...

-12

-32

-41

-65

-83

-89

-73

5 rows × 179 columns

We can melt this by the function melt. It will keep the 'y' value and put all the other columns in the variable column

dfm = df.melt(id_vars=['y'])
dfm.head()

variable

value

135

386

-32

-105

-9

This might be handy if I want for instance to groupby y-value to discover the differences in counts, mean or standard deviation. I also can make a graphical overview

dfm.y.value_counts()
dfm.groupby('y').std()
dfm.groupby('y').mean()

import seaborn as sns
sns.violinplot(x=dfm.y, y=dfm.value)

PreviousDateTime wrangling NextCombine data

Last updated 2 years ago