文本文件中的相关矩阵

debugcn 发表于 Dev

我正在尝试从文本文件中获取相关矩阵。我想从这些文件中获取相关值。

文本文件我有什么

[56] "[1] \”values “”of the                                                                                                          
[57] "[1] \”e”xamples                                                                                                              
[58] "[1] \”dummy “”lines                                                                                            
[59] "[1] \”testing”                                                                                                                     
[60] "[1] \"Correlation Values\””                                                                                                         
[61] "[1] \"Correlation between XXX and YYY: 0.7054 (0.0429)\""                                                                            
[62] "[1] \"Correlation between XXX and ZZZ: 0.601 (0.0289)\""                                                                             
[63] "[1] \"Correlation between YYY and ZZZ: 0.6434 (0.0306)\""                                                                            
[64] "[1] \”Finished\””                                                                                        
[65] "[1] \”testing “”linne                                                                            
[66] “test”                                                                                                                                          
[67] “test “again

矩阵看起来像

      XXX       YYY      ZZZ
XXX   1        0.7054    0.601
YYY   0.7054   1         0.6434
ZZZ   0.601    0.6434    1

我了解其中涉及到一些正则表达式技术，但认为对于像我这样的新手来说，它太先进了。我可以使用以下代码从文件中获得所需的行，但仍然无法通过锻炼来提取这些数字并放入矩阵中。

mm[grep("Correlation Values”, mm, value = FALSE) + c(1:3)] ## m is the above file that I loaded.

为了增加复杂性，所有文件中的变量和数字都会更改。说这是4 * 4矩阵的情况

[95] "[1] \"Correlation Values\””                                                                                                                                 
 [96] "[1] \"Correlation between XXX and YYY: 0.7054 (0.0429)\""                                                                                                    
 [97] "[1] \"Correlation between XXX and ZZZ: 0.601 (0.0289)\""                                                                                                     
 [98] "[1] \"Correlation between XXX and CCC: 0.0178 (0.0281)\""                                                                                                    
 [99] "[1] \"Correlation between YYY and ZZZ: 0.6434 (0.0306)\""                                                                                                    
[100] "[1] \"Correlation between YYY and CCC: 0.0103 (0.0286)\""                                                                                                    
[101] "[1] \"Correlation between ZZZ and CCC: 0.0174 (0.0202)\""                                                                                                    
[102] "[1] \”Finished\””

艾琳

好吧，无论如何这都是一个开始……虽然并不优雅，但是一步一步地使您仅将相关信息包含在列表中。我将您的文件放在一个名为sofile.txt的文件中。

# read the messy file
filedata <- readLines("../bugs/sofile.txt", warn = FALSE)
# get rid of lines you don't need.
preline<- grep("Correlation Values", filedata, fixed = TRUE)
postline<- grep("Finished", filedata, fixed = TRUE)
filedata <- filedata[(preline+1):(postline-1)]
# just keep the important parts of the strings
filedata <- substr(filedata, 33, nchar(filedata)-13)
filedata <- sub( ":", "", filedata, fixed = TRUE)
filedata <- sub( " and", "", filedata, fixed = TRUE)
# split them up and make a list
filedata_list<- strsplit(filedata, split = " ")
# put it into a matrix 
new <- Reduce(rbind, filedata_list)
# extract the variable names
names <- unique(c(new[,1], new[,2]))
#create a matrix of NAs with the right dimensions and names.
corrmat <- matrix(nrow =length(names),  ncol = (length(names)), dimnames = list(names, names))

然后，您将着手替换NA。您可以通过遍历列表来分配值来执行此操作。

再次丑陋，但可以帮助您入门。

for (i in 1:length(names)){
 corrmat[filedata_list[[i]][1], filedata_list[[i]][2]] <- filedata_list[[i]][3]
 corrmat[filedata_list[[i]][2], filedata_list[[i]][1]] <- filedata_list[[i]][3]
 corrmat[i, i] <- 1
}

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。