How should I do to transform a RDD[String] to RDD[(String, String)]?

fanhk

I got a RDD[String] from a file:

val file = sc.textFile("/path/to/myData.txt")

myData's format:

>str1_name
ATCGGKFKKVKKFKRLFFVLFLRL
FDJKALGFJVKRIKFKVKFGKLRL
...
FJDLALLLGL //the last line of str1
>str2_name
ATCGGKFKKVKKFKRLFFVLFLRL
FDJKALGFJVKRIKFKVKFGKLRL
...
FJDLALLLGL //the last line of str2
>str3_name
...

How should I do to transform the data from file to a structure RDD[(String, String)] ? For instance,

trancRDD(
(str1_name, ATCGGKFKKVKKFKRLFFVLFLRLFDJKALGFJVKRIKFKVKFGKLRL), 
(str2_name, ATCGGKFKKVKKFKRLFFVLFLRLFDJKALGFJVKRIKFKVKFGKLRL),
...
)
maasg

If there's a defined record separator, like ">" indicated above, this could be done using a custom Hadoop configuration:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.io.{LongWritable, Text}
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat

val conf = new Configuration
conf.set("textinputformat.record.delimiter", ">")
// genome.txt contains the records provided in the question without the "..."
val dataset = sc.newAPIHadoopFile("./data/genome.txt", classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf)
val data = dataset.map(x=>x._2.toString)

Let's have a look at the data

data.collect
res11: Array[String] = 
Array("", "str1_name
ATCGGKFKKVKKFKRLFFVLFLRL
FDJKALGFJVKRIKFKVKFGKLRL
FJDLALLLGL 
", "str2_name
ATCGGKFKKVKKFKRLFFVLFLRL
FDJKALGFJVKRIKFKVKFGKLRL
FJDLALLLGL
")

We can easily make records out of this String

val records =  data.map{ multiLine => val lines = multiLine.split("\n"); (lines.head, lines.tail)}
records.collect
res14: Array[(String, Array[String])] = Array(("",Array()),
       (str1_name,Array(ATCGGKFKKVKKFKRLFFVLFLRL, FDJKALGFJVKRIKFKVKFGKLRL, FJDLALLLGL)),
       (str2_name,Array(ATCGGKFKKVKKFKRLFFVLFLRL, FDJKALGFJVKRIKFKVKFGKLRL, FJDLALLLGL)))

(use filter to take that first empty record out... exercise for the reader)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How do I transform existing log statements into interpolated string format?

From Dev

How do i transform a string to become a comma separated in ruby?

From Dev

how do I transform a string of url's into drawable?

From Dev

How do I transform a List[String] to a List[Map[String,String]] given that the list of string represents the keys to the map in Scala?

From Dev

How do I transform a List[String] to a List[Map[String,String]] given that the list of string represents the keys to the map in Scala?

From Dev

How can I transform a string into a matrix

From Dev

How do I use the matrix transform and other transform CSS properties?

From Dev

Oracle How do I transform this string field into structured data using regular expressions?

From Dev

How do I transform a string from EditText to a character array, to allow audio implementation?

From Dev

How do I unescape HTML, then transform it with XSLT?

From Dev

How do I transform a pattern on booleans to an if condition?

From Dev

How do I transform an array using Jolt?

From Dev

How do I correctly transform an image in WPF?

From Dev

How do I transform this code into a list comprehension?

From Dev

How should I do? ParseQueryAdapter

From Dev

How should I transform code from XHR to Vue-Resource?

From Dev

How should I convert a slice to a string in golang?

From Java

How Should I Define/Declare String Constants

From Dev

How should I separate following three string?

From Dev

How should I make a big string of floats?

From Dev

How should I escape this string properly?

From Dev

How do/Should I persist a ClaimsPrincipal?

From Dev

how should i do asynchronous unit testing?

From Dev

how should i do this query in mysql

From Dev

How should I do this without throwing exceptions?

From Dev

SSD: how often should I do fstrim?

From Dev

JFrame how should I do it (picture included)?

From Dev

How should I do with evaluation function in minimax?

From Dev

How do/Should I persist a ClaimsPrincipal?

Related Related

  1. 1

    How do I transform existing log statements into interpolated string format?

  2. 2

    How do i transform a string to become a comma separated in ruby?

  3. 3

    how do I transform a string of url's into drawable?

  4. 4

    How do I transform a List[String] to a List[Map[String,String]] given that the list of string represents the keys to the map in Scala?

  5. 5

    How do I transform a List[String] to a List[Map[String,String]] given that the list of string represents the keys to the map in Scala?

  6. 6

    How can I transform a string into a matrix

  7. 7

    How do I use the matrix transform and other transform CSS properties?

  8. 8

    Oracle How do I transform this string field into structured data using regular expressions?

  9. 9

    How do I transform a string from EditText to a character array, to allow audio implementation?

  10. 10

    How do I unescape HTML, then transform it with XSLT?

  11. 11

    How do I transform a pattern on booleans to an if condition?

  12. 12

    How do I transform an array using Jolt?

  13. 13

    How do I correctly transform an image in WPF?

  14. 14

    How do I transform this code into a list comprehension?

  15. 15

    How should I do? ParseQueryAdapter

  16. 16

    How should I transform code from XHR to Vue-Resource?

  17. 17

    How should I convert a slice to a string in golang?

  18. 18

    How Should I Define/Declare String Constants

  19. 19

    How should I separate following three string?

  20. 20

    How should I make a big string of floats?

  21. 21

    How should I escape this string properly?

  22. 22

    How do/Should I persist a ClaimsPrincipal?

  23. 23

    how should i do asynchronous unit testing?

  24. 24

    how should i do this query in mysql

  25. 25

    How should I do this without throwing exceptions?

  26. 26

    SSD: how often should I do fstrim?

  27. 27

    JFrame how should I do it (picture included)?

  28. 28

    How should I do with evaluation function in minimax?

  29. 29

    How do/Should I persist a ClaimsPrincipal?

HotTag

Archive