R dcast equivalent in python pandas

Adriano Almeida

I am trying to do the equivalent of the below commands in python:

test <- data.frame(convert_me=c('Convert1','Convert2','Convert3'),
                   values=rnorm(3,45, 12), age_col=c('23','33','44'))
test

library(reshape2)
t <- dcast(test, values ~ convert_me+age_col, length  )
t

That is, this:

convert_me   values     age_col
Convert1     21.71502      23
Convert2     58.35506      33
Convert3     60.41639      44

becomes this:

values     Convert2_33 Convert1_23 Convert3_44
21.71502          0           1           0
58.35506          1           0           0
60.41639          0           0           1

I know that with dummy variables I can get the value of the columns and transform as the name of the column, but is there a way to merge them(combination) easily, as R does?

joris

You can use the crosstab function for this:

In [14]: pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])
Out[14]: 
convert_me  Convert1  Convert2  Convert3
age_col           23        33        44
values                                  
21.71502           1         0         0
58.35506           0         1         0
60.41639           0         0         1

or the pivot_table (with len as the aggregating function, but here you have to fillna the NaNs with zeros manually):

In [18]: df.pivot_table(index=['values'], columns=['age_col', 'convert_me'], aggfunc=len).fillna(0)
Out[18]: 
age_col           23        33        44
convert_me  Convert1  Convert2  Convert3
values                                  
21.71502           1         0         0
58.35506           0         1         0
60.41639           0         0         1

See here for the docs on this: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations

Most functions in pandas will return a multi-level (hierarchical) index, in this case for the columns. If you want to 'melt' this into one level like in R you can do:

In [15]: df_cross = pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']])

In [16]: df_cross.columns = ["{0}_{1}".format(l1, l2) for l1, l2 in df_cross.columns]

In [17]: df_cross
Out[17]: 
          Convert1_23  Convert2_33  Convert3_44
values                                         
21.71502            1            0            0
58.35506            0            1            0
60.41639            0            0            1

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Python Pandas equivalent in JavaScript

From Java

python equivalent of R table

From Java

Equivalent of R's factor function in Pandas

From Dev

Equivalent of transform in R/ddply in Python/pandas?

From Dev

equivalent of R's View for Python's pandas

From Dev

R reshape dcast 0 and 1

From Dev

Python equivalent of R "split"-function

From Dev

Python equivalent for R's 'zoo' package

From Dev

Python numpy or pandas equivalent of the R function sweep()

From Dev

Pandas Equivalent of R's which()

From Dev

Python equivalent of the R operator "%in%"

From Dev

Equivalent of R's createDataPartition in Python

From Dev

Equivalent of R function 'ave' in Python Pandas

From Dev

Is there an equivalent of SQL GROUP BY ROLLUP in Python pandas?

From Dev

R equivalent to the Python function "dir"?

From Dev

Equivalent of R's removeSparseTerms in Python

From Dev

What is the R equivalent of pandas .resample() method?

From Dev

Equivalent of R rbind.fill in Python Pandas

From Dev

Is there a Python equivalent to R's sample() function?

From Dev

Equivalent of source() of R in Python

From Dev

Equivalent of R/ifelse in Python/Pandas? Compare string columns?

From Dev

Pandas equivalent of Python's readlines function

From Dev

Python Pandas VLookup with multiple columns equivalent

From Dev

pandas equivalent for R dcast

From Dev

R equivalent of Python 'pass' statement

From Dev

Equivalent of "table" of R in python

From Dev

R reshape dcast 0 and 1

From Dev

A special case of dcast in R

From Dev

Python PANDAS: Merge Equivalent of "<=" SQL Join