%matplotlib tk
Imagine that you have to do data cleaning on 10s or 100s of sample points (akin to a row in a 2d matrix). For the purposes of data cleaning, you’d also need to zoom/pan at the data correpsonding to each sample point. Would you create 100s of static plots? We lose the zoom/pan ability there. How about we write a simple function and manually
change the argument to reflect the sample #.
In this post, I’ll be looking at a simple Matplotlib widget to sift through the samples and retain the ability to pan and zoom. This post is heavily inspired by Jake Vanderplas’ PyData 2013 Matplotlib tutorial. I would be creating 15 timeseries having recorded daily for an year for illustration purposes.
Setting the backend to TK.
For some reasons, it works better than the default OSX one.
Customary imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sys
Creating the data
# Fixing the seed for reproducibility
0)
np.random.seed(= pd.DataFrame(np.random.randn(365, 15), index=pd.DatetimeIndex(start='2017',freq='D', periods=365)) df
range(5)] df.head()[
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
2017-01-01 | 1.764052 | 0.400157 | 0.978738 | 2.240893 | 1.867558 |
2017-01-02 | 0.333674 | 1.494079 | -0.205158 | 0.313068 | -0.854096 |
2017-01-03 | 0.154947 | 0.378163 | -0.887786 | -1.980796 | -0.347912 |
2017-01-04 | -0.438074 | -1.252795 | 0.777490 | -1.613898 | -0.212740 |
2017-01-05 | -0.672460 | -0.359553 | -0.813146 | -1.726283 | 0.177426 |
= plt.subplots()
fig, ax =ax) df.plot(ax
Notice, that since I used %matplotlib TK
backend, I don’t see the plot embedded in the notebook. Thus I’ll save the current figure as an image and then link it here.
"all_data.png") plt.savefig(
This sure does not look pretty.
Proposed solution
Great. It seems to do the intended job. Let us now look at the individual pieces and how we can tie them up.
Creating the initial frame
In the first frame we would like to plot the data for the first sample.
= plt.subplots()
fig, ax 0].plot(ax=ax, title="Sample number: 0") df[
Ensuring we do not select data point out of range
If you notice, we simply incremented and decremented the selected data point without considering going beyond (0, number of data points). So, we need to change the call back functions to check that we do not go beyond the range. This would require the following changes to next()
with the changes to prev()
being similar.
= 0
data_min = data.shape[1]-1
data_max = 0
selected def next(self, event):
if self.selected >=self.data_max:
self.selected = self.data_max
'Last sample reached. Cannot go forwards')
ax.set_title(else:
self.selected += 1
ax.cla()self.selected].plot(ax=ax)
df["Sample number: %d" %self.selected) ax.set_title(
There you go. This was fairly simple and fun to do, and yet can be very helpful!
Complete code
from matplotlib.widgets import Button
= plt.subplots()
fig, ax =0.2)
fig.subplots_adjust(bottom
0].plot(ax=ax, title="Sample number: 0")
df[
class Index:
= df
data = 0
data_min = data.shape[1]-1
data_max = 0
selected def next(self, event):
if self.selected >=self.data_max:
self.selected = self.data_max
'Last sample reached. Cannot go forwards')
ax.set_title(else:
self.selected += 1
ax.cla()self.selected].plot(ax=ax)
df["Sample number: %d" %self.selected)
ax.set_title(
def prev(self, event):
if self.selected <=self.data_min:
self.selected = 0
'First sample reached. Cannot go backwards')
ax.set_title(else:
self.selected -= 1
ax.cla()self.selected].plot(ax=ax)
df["Sample number: %d" %self.selected)
ax.set_title(
= Index()
callback = plt.axes([0.7, 0.05, 0.1, 0.075])
axprev = plt.axes([0.81, 0.05, 0.1, 0.075])
axnext
= Button(axnext, '>')
bnext next)
bnext.on_clicked(callback.
= Button(axprev, '<')
bprev bprev.on_clicked(callback.prev)
0
Advanced example
Here is another slightly more advanced wideget use case in action.
I will just put the code up here and leave the understanding upto the reader as an exercise.
with pd.HDFStore('temp-store.h5', mode='w') as st:
# 15 home-> 2 columns, 365 rows (daily one reading)
for home in range(15):
= pd.DataFrame(np.random.randn(365, 2), columns=['fridge','microwave'],
df =pd.DatetimeIndex(start='2017',freq='D', periods=365))
index= df.abs()
df '/home_%d' %home] = df st[
= pd.HDFStore('temp-store.h5', mode='r') st
from matplotlib.widgets import Button, CheckButtons
= plt.subplots()
fig, ax =0.2)
fig.subplots_adjust(bottom=0.2)
fig.subplots_adjust(left
= st['/home_0']
home_0
= plt.axes([0.02, 0.4, 0.13, 0.2], aspect='equal')
rax
= tuple(home_0.columns)
labels = tuple([True]*len(labels))
states = CheckButtons(rax, labels, states)
check
'/home_0'].plot(ax=ax, title="Sample number: 0").legend(loc=2)
st[= ax.get_lines()
lines
class Index:
= st
store = 0
data_min = len(store.keys())-1
data_max = 0
selected = states, labels
st, la = dict(zip(la, st))
states_dict def selected_column(self, label):
self.states_dict[label] = not self.states_dict[label]
self.plot()
def plot(self):
ax.cla()'/home_%d' %self.selected].plot(ax=ax, title="Sample number: %d" %self.selected).legend(loc=2)
st[= ax.get_lines()
lines for i ,(l, s) in enumerate(self.states_dict.items()):
lines[i].set_visible(s)=1)
plt.legend(loc
def next(self, event):
if self.selected >=self.data_max:
self.selected = self.data_max
'Last sample reached. Cannot go forwards')
ax.set_title(else:
self.selected += 1
self.plot()
def prev(self, event):
if self.selected <=self.data_min:
self.selected = 0
'First sample reached. Cannot go backwards')
ax.set_title(else:
self.selected -= 1
self.plot()
= Index()
callback = plt.axes([0.7, 0.05, 0.1, 0.075])
axprev = plt.axes([0.81, 0.05, 0.1, 0.075])
axnext
= Button(axnext, '>')
bnext next)
bnext.on_clicked(callback.
= Button(axprev, '<')
bprev
bprev.on_clicked(callback.prev)
; check.on_clicked(callback.selected_column)