How to pass schema to create a new Dataframe from existing Dataframe?

debugcn 投稿 Dev

BlackBeard

To pass schema to a json file we do this:

from pyspark.sql.types import (StructField, StringType, StructType, IntegerType)
data_schema = [StructField('age', IntegerType(), True), StructField('name', StringType(), True)]
final_struc = StructType(fields = data_schema)
df =spark.read.json('people.json', schema=final_struc)

The above code works as expected. However now, I have data in table which I display by:

df = sqlContext.sql("SELECT * FROM people_json")

But if I try to pass a new schema to it by using following command it does not work.

df2 = spark.sql("SELECT * FROM people_json", schema=final_struc)

It gives the following error:

sql() got an unexpected keyword argument 'schema'

NOTE: I am using Databrics Community Edition

What am I missing?
How do I pass the new schema if I have data in the table instead of some JSON file?

koiralo

You cannot apply a new schema to already created dataframe. However, you can change the schema of each column by casting to another datatype as below.

df.withColumn("column_name", $"column_name".cast("new_datatype"))

If you need to apply a new schema, you need to convert to RDD and create a new dataframe again as below

df = sqlContext.sql("SELECT * FROM people_json")
val newDF = spark.createDataFrame(df.rdd, schema=schema)

Hope this helps!

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-05-31

コメントを追加

サインイン

分類Dev

Create a new DataFrame from two existing DataFrames

分類Dev

Create a new DataFrame column from a condition of an existing one?

分類Dev

Dataframe add new rows from existing ones

分類Dev

Creating new pandas DataFrame from existing DataFrame and index

分類Dev

How to create new column conditional on existing columns in pandas dataframe using for loop

分類Dev

Copy schema from one dataframe to another dataframe

分類Dev

How to create new values in a pandas dataframe column based on values from another column

分類Dev

How to create a Pandas DataFrame from a list of OrderedDicts?

分類Dev

pandas groupby create a new dataframe with label from apply operation

分類Dev

How to copy a selected row from dataframe from existing dataframe on conditional check? [python]

分類Dev

Create new pd.DataFrame column if value of existing column contains specific substring

分類Dev

How do I create a new object from an existing object in Javascript?

分類Dev

How to Create new Database from Existing Data Base in c#?

分類Dev

Creating a new dataframe with many rows for each row in existing dataframe

分類Dev

How to create new date and insert as index in pandas dataframe?

分類Dev

How to do groupby max to create new columns in pandas dataframe

分類Dev

How to speed up Pandas apply function to create a new column in the dataframe?

分類Dev

how to pass multipleColumns transformation rules from XML file to Dataframe in Spark?

分類Dev

How to pass multiple arguments from a pandas dataframe to a function and return the result to the datframe at specific locations in the dataframe

分類Dev

Python: create a new column from existing columns

分類Dev

Create a new plist from the existing plist?

分類Dev

Create new array from an existing array attributes

分類Dev

How can i create a list (new) from an existing list where items of list(new) will be list in python

分類Dev

Create dataframe from itertools product

分類Dev

How to make a new dataframe from existig dataframe by averaging out some of the columns

分類Dev

How to join a dataframe with a new variable?

分類Dev

Python: How to create DataFrame columns from a dict inside a list

分類Dev

How to create a confusion matrix from an incomplete dataframe in python

分類Dev

Holoviews - How to create side-by-side bars from dataframe columns?

Related 関連記事

記事