I have a bunch of data that represents choices from a large collection, and a classification. Something like:
pizzas = [
['ham','cheese','pineapple'],
['bacon','feta','cheese'],
['mushrooms','feta','ham],
...
]
I want to turn this into a data frame with one column for each topping type, with one row for each pizza. Something like
ham cheese ... feta
1 1 0
0 1 1
0 0 1
...
(Obviously there will be a lot more columns and rows, but you get the general idea.)
What is the best way to do this?
You can try first create DataFrame
from constructor, then use get_dummies
and last groupby
by columns and sum
:
import pandas as pd
pizzas = [
['ham','cheese','pineapple'],
['bacon','feta','cheese'],
['mushrooms','feta','ham']
]
df = pd.DataFrame(pizzas)
print df
0 1 2
0 ham cheese pineapple
1 bacon feta cheese
2 mushrooms feta ham
df = pd.get_dummies(df, prefix_sep='', prefix='')
print df
bacon ham mushrooms cheese feta cheese ham pineapple
0 0 1 0 1 0 0 0 1
1 1 0 0 0 1 1 0 0
2 0 0 1 0 1 0 1 0
print df.groupby(df.columns, axis=1).sum()
bacon cheese feta ham mushrooms pineapple
0 0 1 0 1 0 1
1 1 1 1 0 0 0
2 0 0 1 1 1 0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments