Plotting with different data structures

plotting with numpy, pandas and ColumnDataSource

Creating a plot using Numpy

Instead of the simple array of data points, we can use np.array to fill into a plot. For demonstration purposes, we generate 10 data points that follow the pattern of a sinus. Once we created the dataset we are going to interpolate to create 50 datapoints following the same pattern. Linear Interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points. Let us look to a very simple example

import numpy as np

#original data points using linspace to generate 10 datapoints sequence
xdata = np.linspace(0, 2*np.pi, 10)
ydata = np.sin(xdata) # use sinus function to calculate the sinus ydata for xdata
# to be interpolated data point
xs = np.linspace(0, 2*np.pi, 50) #create 50 datapoints sequence
ys = np.interp(xs, xdata, ydata) #based on previous xdata, ydata relation estimate ys for xs

from bokeh.plotting import figure, output_file, show

output_file("interpolation.html")

p = figure(plot_height=400, plot_width=400, title="interpolation example")

p.circle(xdata,ydata,size=8,color='red',legend_label='data')
p.cross(xs,ys,size=8,color='blue', legend_label='interpolation')
p.line(xs,ys, color='lightgrey')
p.legend.location = "top_right"

show(p)
plot with Numpy data

Creating a plot using Pandas DataFrame

Columns of a pandas dataframe are nothing more than a numpy array and can be used in a similar way. Below you find the iris dataset to be imported. This is a pandas dataframe. From that dataframe the column 'petal_length' and the column 'petal_width' are selected. These are arrays which can be plotted the same way we used the arrays in a scatter plot before.

from bokeh.plotting import figure, show, output_file
from bokeh.sampledata.iris import flowers

colormap = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}
colors = [colormap[x] for x in flowers['species']]

p = figure(title = "Iris Morphology")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Petal Width'

p.circle(flowers["petal_length"], flowers["petal_width"],
         color=colors, fill_alpha=0.2, size=10)

output_file("iris.html", title="iris.py example")

show(p)
plot from pandas dataframe

Creating plot using ColumnDataSource

When you pass in data with arrays, either an ordinary one or an numpy array, Bokeh works behind the scenes to make a ColumnDataSource for you. Learning to create and use the ColumnDataSource will enable you to access more advanced capabilities, such as streaming data, sharing data between plots, and filtering data. At the most basic level, a ColumnDataSource is simply a mapping between column names and lists of data. The ColumnDataSource takes a data parameter which is a dict, with string column names as keys and lists (or arrays) of data values as values. If one positional argument is passed to the ColumnDataSource initializer, it will be taken as data. Once the ColumnDataSource has been created, it can be passed into the source parameter of plotting methods which allows you to pass a column’s name as a stand-in for the data values:

data = {'x_values': [1, 2, 3, 4, 5],
        'y_values': [6, 7, 2, 3, 6]}

source = ColumnDataSource(data=data)
p.circle(x='x_values', y='y_values', source=source)

Instead of a dictionary, you can parse a pandas dataframe as well.

from bokeh.models import ColumnDataSource

source = ColumnDataSource(df)
p.circle(x='serum_creatinine', y='platelets', source=source)

Last updated

Was this helpful?