I am currently using :
+---+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id |sen |attributes |
+---+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1 |Stanford is good college.|[[Stanford,ORGANIZATION,NNP], [is,O,VBZ], [good,O,JJ], [college,O,NN], [.,O,.], [Stanford,ORGANIZATION,NNP], [is,O,VBZ], [good,O,JJ], [college,O,NN], [.,O,.]]|
+---+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
I want to get above df from :
+----------+--------+--------------------+
|article_id| sen| attribute|
+----------+--------+--------------------+
| 1|example1|[Standford,Organi...|
| 1|example1| [is,O,VP]|
| 1|example1| [good,LOCATION,ADP]|
+----------+--------+--------------------+
using :
df3.registerTempTable("d1")
val df4 = sqlContext.sql("select article_id,sen,collect(attribute) as attributes from d1 group by article_id,sen")
Is there any way that I don't have to register temp table, as while saving dataframe, it is giving lot of garbage!! Something lige df3.Select""??
The only way Spark currently has to run SQL against a dataframe is via a temporary table. However, you can add implicit methods to DataFrame
to automate this, as we have done at Swoop. I can't share all the code as it uses a number of our internal utilities & implicits but the core is in the following gist. The importance of using unique temporary tables is that (at least until Spark 2.0) temporary tables are cluster global.
We use this approach regularly in our work, especially since there are many situations in which SQL is much simpler/easier to write and understand than the Scala DSL.
Hope this helps!
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments