我正在尝试从文本文件中获取相关矩阵。我想从这些文件中获取相关值。
文本文件我有什么
[56] "[1] \”values “”of the
[57] "[1] \”e”xamples
[58] "[1] \”dummy “”lines
[59] "[1] \”testing”
[60] "[1] \"Correlation Values\””
[61] "[1] \"Correlation between XXX and YYY: 0.7054 (0.0429)\""
[62] "[1] \"Correlation between XXX and ZZZ: 0.601 (0.0289)\""
[63] "[1] \"Correlation between YYY and ZZZ: 0.6434 (0.0306)\""
[64] "[1] \”Finished\””
[65] "[1] \”testing “”linne
[66] “test”
[67] “test “again
矩阵看起来像
XXX YYY ZZZ
XXX 1 0.7054 0.601
YYY 0.7054 1 0.6434
ZZZ 0.601 0.6434 1
我了解其中涉及到一些正则表达式技术,但认为对于像我这样的新手来说,它太先进了。我可以使用以下代码从文件中获得所需的行,但仍然无法通过锻炼来提取这些数字并放入矩阵中。
mm[grep("Correlation Values”, mm, value = FALSE) + c(1:3)] ## m is the above file that I loaded.
为了增加复杂性,所有文件中的变量和数字都会更改。说这是4 * 4矩阵的情况
[95] "[1] \"Correlation Values\””
[96] "[1] \"Correlation between XXX and YYY: 0.7054 (0.0429)\""
[97] "[1] \"Correlation between XXX and ZZZ: 0.601 (0.0289)\""
[98] "[1] \"Correlation between XXX and CCC: 0.0178 (0.0281)\""
[99] "[1] \"Correlation between YYY and ZZZ: 0.6434 (0.0306)\""
[100] "[1] \"Correlation between YYY and CCC: 0.0103 (0.0286)\""
[101] "[1] \"Correlation between ZZZ and CCC: 0.0174 (0.0202)\""
[102] "[1] \”Finished\””
好吧,无论如何这都是一个开始……虽然并不优雅,但是一步一步地使您仅将相关信息包含在列表中。我将您的文件放在一个名为sofile.txt的文件中。
# read the messy file
filedata <- readLines("../bugs/sofile.txt", warn = FALSE)
# get rid of lines you don't need.
preline<- grep("Correlation Values", filedata, fixed = TRUE)
postline<- grep("Finished", filedata, fixed = TRUE)
filedata <- filedata[(preline+1):(postline-1)]
# just keep the important parts of the strings
filedata <- substr(filedata, 33, nchar(filedata)-13)
filedata <- sub( ":", "", filedata, fixed = TRUE)
filedata <- sub( " and", "", filedata, fixed = TRUE)
# split them up and make a list
filedata_list<- strsplit(filedata, split = " ")
# put it into a matrix
new <- Reduce(rbind, filedata_list)
# extract the variable names
names <- unique(c(new[,1], new[,2]))
#create a matrix of NAs with the right dimensions and names.
corrmat <- matrix(nrow =length(names), ncol = (length(names)), dimnames = list(names, names))
然后,您将着手替换NA。您可以通过遍历列表来分配值来执行此操作。
再次丑陋,但可以帮助您入门。
for (i in 1:length(names)){
corrmat[filedata_list[[i]][1], filedata_list[[i]][2]] <- filedata_list[[i]][3]
corrmat[filedata_list[[i]][2], filedata_list[[i]][1]] <- filedata_list[[i]][3]
corrmat[i, i] <- 1
}
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句