I am having difficulty figuring out how to convert some wide data into long format. I have three columns of string data (A1_R00_FillerNP
, A1_R01_ADV
, and A1_R02_1stEmbV
) which I would like to melt into one column (WordCountRegion
) in such a way that for each Subject and item the correct word will be mapped from one of these three columns to the new, WordCountRegion
column.
Using a simple melt
approach as in the code below gets me part of the way there:
(Note: the strange characters in the df
are inconsequential - please ignore them here)
df <- structure(list(Subject = c(101L, 101L, 101L, 101L, 101L, 101L,
101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L,
101L), condition = structure(c(2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L,
3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L), .Label = c("P", "R",
"S"), class = "factor"), item = c(101L, 102L, 103L, 101L, 102L,
103L, 101L, 102L, 103L, 101L, 102L, 103L, 101L, 102L, 103L, 101L,
102L, 103L), A1_R00_FillerNP = structure(c(3L, 2L, 1L, 3L, 2L,
1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("SÌÇna d_r allvarliga konsekvenser",
"SÌÇna d_r fina _ppeltr_d", "SÌÇna d_r gamla skottk_rror"
), class = "factor"), A1_R01_ADV = structure(c(1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("alltid",
"f_rresten"), class = "factor"), A1_R02_1stEmbV = structure(c(3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L,
1L), .Label = c("diskuterade", "stod", "tv_ttade"), class = "factor"),
RT = c(0L, 149L, 247L, 272L, 171L, 245L, 317L, 0L, 233L,
0L, 981L, 750L, 272L, 171L, 334L, 317L, 0L, 233L), Region = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("R00", "R01", "R02"), class = "factor"),
RegionType = structure(c(3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L,
1L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("1stEmbV",
"ADV", "FillerNP"), class = "factor"), DV = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("FIRST_FIXATION_DURATION", "GAZE_DURATION"
), class = "factor")), .Names = c("Subject", "condition",
"item", "A1_R00_FillerNP", "A1_R01_ADV", "A1_R02_1stEmbV", "RT",
"Region", "RegionType", "DV"), class = "data.frame", row.names = c(NA,
-18L))
df1 = melt(df, measure.vars = c("A1_R00_FillerNP","A1_R01_ADV","A1_R02_1stEmbV"), var = "WordCountRegion")
問題は、このコードが地域間で単語を誤って分割することです。私は言葉がで指定され壊れない以下のような出力が終わるRegion
と、代わりの値を横切って延在Region
することによってわかるように、WordCountRegion
とvalue
。これを使用する場合は、melt()がデータを正しく分割できるように、何らかの追加の仕様が必要であることは明らかです。これを行う方法がわかりません(またはmelt()内で実行できるかどうか)。
Subject condition item RT Region RegionType DV WordCountRegion value
1 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
2 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
3 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
4 101 R 101 272 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
5 101 P 102 171 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
6 101 S 103 245 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
7 101 R 101 317 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
8 101 P 102 0 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
9 101 S 103 233 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
10 101 R 101 0 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
11 101 P 102 981 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
12 101 S 103 750 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
13 101 R 101 272 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
14 101 P 102 171 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
15 101 S 103 334 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
16 101 R 101 317 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
17 101 P 102 0 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
18 101 S 103 233 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
19 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV alltid
20 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV alltid
21 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV f_rresten
以下のサンプルのように、melt()
これらを整列/一致させるように変更できる方法はありますか?Region
Subject condition item RT Region RegionType DV WordCountRegion value
1 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
2 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
3 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
4 101 R 101 272 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV alltid
5 101 P 102 171 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV alltid
6 101 S 103 245 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV f_rresten
7 101 R 101 317 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV tv_ttade
8 101 P 102 0 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV stod
9 101 S 103 233 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV diskuterade
10 101 R 101 0 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
11 101 P 102 981 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
12 101 S 103 750 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
または、間違った機能を完全に使用している場合、誰かが私にもっと良い解決策を教えてもらえますか?おそらく、実際のルックアップを行うものが必要ですか?
小さなルックアップテーブルを作成してマージし、それを使用してメルトされたデータフレームをフィルタリングすることができます。これにより、探している結果が得られると思います。
region_df <- data.frame(var = c("A1_R00_FillerNP","A1_R01_ADV","A1_R02_1stEmbV"),
Region = c('R00','R01','R02'))
df2 <- merge(df1, region_df)
df3 <- subset(df2, var==WordCountRegion)
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加