Avoiding column duplicate column names when joining two data frames in PySpark

BillyBoy

I have the following code:

from pyspark.sql import SQLContext
ctx = SQLContext(sc)
a = ctx.createDataFrame([("1","a",1),("2","a",1),("3","a",0),("4","a",0),("5","b",1),("6","b",0),("7","b",1)],["id","group","value1"])
b = ctx.createDataFrame([("1","a",8),("2","a",1),("3","a",1),("4","a",2),("5","b",1),("6","b",3),("7","b",4)],["id","group","value2"])
c = a.join(b,"id")
c.select("group")

It returns an error:

pyspark.sql.utils.AnalysisException: Reference 'group' is ambiguous, could be: group#1406, group#1409.;

The problem is that c has twice the same column "group":

>>> c.columns
['id', 'group', 'value1', 'group', 'value2']

I would like to be able to do c.select("a.group") for example but I don't know how to have the column names automatically adjusted when doing the join.

Mariusz

Just remove quotes: c.select(a.group) and it will select group column from a dataframe.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Combine two data frames with the same column names

From Java

R interleave two data frames with same column names

From Java

Merge two data frames by row and column names and by group

From Dev

how to concat two data frames with different column names in pandas? - python

From Dev

How to merge two data.frames by the first part of a names in a column?

From Dev

How to merge two data frames based on different column names

From Dev

Joining two selects - different column names

From Dev

Joining two selects - different column names

From Dev

Changing Column Names in a List of Data Frames in R

From Dev

Appending data frames in R based on column names

From Dev

Joining two file data based on column comparision

From Dev

Can I replace NAs when joining two data frames with dplyr?

From Dev

pandas: how to aggregate two list columns when joining data frames

From Java

In R, how to combine two data frames where column names in one equals row values in another?

From Dev

setdiff two single column data frames

From Dev

Merging two data frames based on the index column

From Dev

Merging two data.frames by key column

From Dev

Pandas column bind (cbind) two data frames

From Dev

Merging two data frames on a common column in python

From Dev

Divide a Column based on two Data Frames in R

From Dev

setdiff two single column data frames

From Dev

Combining two data frames with new column(s)

From Dev

Combine two data frames without a common column

From Dev

Merge multiple data tables with duplicate column names

From Dev

pyspark joining dataframes with struct column

From Java

Joining pandas dataframes by column names

From Dev

Converting a list of data frames to a single data frame and change column names

From Dev

Convert row names in multiple data frames to column in data frame

From Dev

Change column names of many data frames in a for-loop

Related Related

  1. 1

    Combine two data frames with the same column names

  2. 2

    R interleave two data frames with same column names

  3. 3

    Merge two data frames by row and column names and by group

  4. 4

    how to concat two data frames with different column names in pandas? - python

  5. 5

    How to merge two data.frames by the first part of a names in a column?

  6. 6

    How to merge two data frames based on different column names

  7. 7

    Joining two selects - different column names

  8. 8

    Joining two selects - different column names

  9. 9

    Changing Column Names in a List of Data Frames in R

  10. 10

    Appending data frames in R based on column names

  11. 11

    Joining two file data based on column comparision

  12. 12

    Can I replace NAs when joining two data frames with dplyr?

  13. 13

    pandas: how to aggregate two list columns when joining data frames

  14. 14

    In R, how to combine two data frames where column names in one equals row values in another?

  15. 15

    setdiff two single column data frames

  16. 16

    Merging two data frames based on the index column

  17. 17

    Merging two data.frames by key column

  18. 18

    Pandas column bind (cbind) two data frames

  19. 19

    Merging two data frames on a common column in python

  20. 20

    Divide a Column based on two Data Frames in R

  21. 21

    setdiff two single column data frames

  22. 22

    Combining two data frames with new column(s)

  23. 23

    Combine two data frames without a common column

  24. 24

    Merge multiple data tables with duplicate column names

  25. 25

    pyspark joining dataframes with struct column

  26. 26

    Joining pandas dataframes by column names

  27. 27

    Converting a list of data frames to a single data frame and change column names

  28. 28

    Convert row names in multiple data frames to column in data frame

  29. 29

    Change column names of many data frames in a for-loop

HotTag

Archive