我有一个称为的数据框test.data
,其中有一个名为的列Ethnicity
。一共有三个种族(实际数据更多),阿迪格(Adygei),Bal路支(Balochi)和Biaka_pygmies。我想将这个数据框作为子集,以仅包括来自每个种族的两个随机样本(行)并获取result
。我如何在R中做到这一点?
test.data <- structure(list(Sample = c("1793102418_A", "1793102460_A", "1793102500_A",
"1793102576_A", "1749751113_A", "1749751187_A", "1749751189_A",
"1749751285_A", "1749751356_A", "1749751195_A", "1749751218_A",
"1775705355_A"), Ethnicity = c("Adygei", "Adygei", "Adygei",
"Adygei", "Balochi", "Balochi", "Balochi", "Balochi", "Balochi",
"Biaka_Pygmies", "Biaka_Pygmies", "Biaka_Pygmies"), Height = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Sample", "Ethnicity",
"Height"), row.names = c("1793102418_A", "1793102460_A", "1793102500_A",
"1793102576_A", "1749751113_A", "1749751187_A", "1749751189_A",
"1749751285_A", "1749751356_A", "1749751195_A", "1749751218_A",
"1775705355_A"), class = "data.frame")
结果
Sample Ethnicity Height
1793102418_A 1793102418_A Adygei 0
1793102460_A 1793102460_A Adygei 0
1749751189_A 1749751189_A Balochi 0
1749751285_A 1749751285_A Balochi 0
1749751195_A 1749751195_A Biaka_Pygmies 0
1775705355_A 1775705355_A Biaka_Pygmies 0
我们可以使用data.table
。将“ data.frame”转换为“ data.table”(setDT(test.data)
),并按“ Ethnicity”分组,我们sample
将根据行的顺序对行进行排序并对其进行子集化。
setDT(test.data)[, .SD[sample(1:.N,2)], Ethnicity]
或使用tapply
从base R
test.data[ with(test.data, unlist(tapply(seq_len(nrow(test.data)),
Ethnicity, FUN = sample, 2))), ]
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句