Pandas: Pivoting and plotting workflow

Lemming Published at Dev

Lemming

Disclaimer: I am very new to Pandas.

I am doing numerical simulations and would like to use Pandas for the final data-evaluation. To keep things simple let's assume the following setup:

My simulations take a few input parameters (E.g. max, and size). The simulation then produces a number of observables as functions of time (E.g. f1(t), f2(t)). In the end, the results of three different simulations could look like this:

t1 = np.linspace(0, 2, 15)
t2 = np.linspace(0, 2, 21)
t3 = np.linspace(0, 1.5, 16)
df1 = pd.DataFrame({'max': t1.max(), 'size': t1.size, 't': t1, 'f1': t1**2+0, 'f2': t1**3+0})
df2 = pd.DataFrame({'max': t2.max(), 'size': t2.size, 't': t2, 'f1': t2**2+1, 'f2': t2**3+1})
df3 = pd.DataFrame({'max': t3.max(), 'size': t3.size, 't': t3, 'f1': t3**2+2, 'f2': t3**3+2})

Where max, and size are the parameters to each simulation, t is the time axis, and f1, and f2 are the observables.

Say, as a first task, I would like to plot the values of f1 as a function of t for each set of parameters. After spending some time with the docs I found that the pivot_table function can rearrange my data in the right way.

df = pd.concat([df1, df2, df3])
df_ms = pd.pivot_table(df, index=['t'], values=['f1', 'f2'], columns=['max', 'size'])

Intermediate question: Is this the best way to do this? I know that DataFrame takes an index argument in its constructor. Would it be better to define t as the index at that point? (I couldn't get it working together with pivot_table)

Now we can use the plot method to plot the resulting data.

df_ms['f1'].plot()

The result, however, is unexpected. I understand that some data is missing, as pandas is forced to introduce NaNs when aligning the different t axes.

My question: Why doesn't the green curve show up at all? And why are the blue and red patches aligned? Is there a simple way to skip the NaNs in the plot, along the lines of what you would get by simply calling plt.plot(t, f1) in matplotlib?

Plot with missing data

I know that it is possible to fill the NaNs by interpolation. For the given case second order splines are quite ideal.

df_ms['f1'].interpolate(method='spline', order=2).plot()

However, I am wondering why this should be necessary for simply plotting the data. Matplotlib's internal linear interpolation would be sufficient...

Plot with interpolation

DrV

The nans behave logically, but not always very intuitively.

If you plot a continuous line, a nan will naturally remove line segments from both sides of the nan point. So, if your data (green line) never has two numbers as adjacent elements, it will not be drawn. For example, if f1 is then [nan, 1, nan, 1.2, nan, nan, 2.3], no segments can be drawn.

Fix #1: draw points instead of lines (plot(t, f1, 'o')), then you'll at least see all your data.

Fix #2: remove all nans from your data before plotting. Let us assume t has all values but f1 is missing values:

import numpy as np
import matplotlib.pyplot as plt

nonnans = -np.isnan(f1)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(t[nonnans], f1[nonnans])

So, just create an array telling which of the samples are good, and use only those samples in plotting. (And in case you are wondering, the ax.plot stuff is equivalent to plt.plot but using the recommended object-oriented interface.)

The way plot treats nans may feel a bit annoying at first, but it is very useful once you grasp it.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-10

Comments

0 comments

From Dev

Related Related

Article

Pandas: Pivoting and plotting workflow

Pandas: Pivoting and plotting workflow

Pivoting data in pandas

Pandas dataframe pivoting

Pivoting Dataframe with Pandas

Pandas pivoting with boolean

Excel survey data pivoting, plotting and analysis

Pivoting pandas dataframe by rank on id

Multi-index pivoting in Pandas

Data wrangling with Python Pandas and pivoting

Pandas Dataframe Stacking versus Pivoting

Resampling, grouping, pivoting a pandas dataframe

Pivoting tables with duplicated data in pandas

Pivoting a pandas dataframe with duplicate index values

Pivoting a Pandas dataframe with a gapless daterange as index

Pandas: Pivoting with multi-index data

pivoting pandas dataframe into prefixed cols, not a MultiIndex

Pivoting a pandas dataframe to generate a (seaborn) heatmap

How to efficiently columnize (=pivoting) pandas DataFrame (with groupby)?

Pivoting (or reshaping) table in pandas into hierarchical columns

pandas - multi index plotting

Pandas plotting in Windows terminal

Pandas - Plotting series

Pandas Data Frame Plotting

Pandas Timeseries plotting

Pandas dataframe manipulation and plotting

Plotting pandas timedelta

Plotting pandas groupby results

pandas plotting a group

Trouble plotting Pandas Series

Plotting with GroupBy in Pandas/Python