将多个CSV合并到一个数据框中

debugcn 发表于 Dev

我有1个文件夹，其中包含100多个csv文件，每个文件夹具有不同的列名（15个列名）和不同的文件名。示例如下：

[1] "Data/Yahoo_2014.csv"   "Data/Yahoo_2015.csv"  
[3] "Data/Yahoo_2016.csv"   "Data/Yahoo_2017.csv"  
[5] "Data/Yahoo_2018.csv"   "Data/Yahoo_2019.csv"  
[7] "Data/Yahoo_2020.csv"   "Data/Google_2014.csv"
[9] "Data/Google_2015.csv"  "Data/Google_2016.csv"

etc

每个csv具有不同的列名。雅虎数据示例

Date Yahoo

对于Google

Date Google

唯一相似的是第一列（日期）。我想将所有这些数据合并到R中的一个csv文件中，以便继续进行分析。结果应如下所示：

Date Yahoo Google
1   2014-01-05  75  50
2   2014-01-12  84  6
3   2014-01-19  81  3
4   2014-01-26  82  35

我已经看过StackOverflow中的其他问题，但没有发现类似的问题。我想出了这个解决方案，但由于它们具有不同的列名，因此无法使用。

data <- read.csv(paste0("Data/","Yahoo_2014.csv"),
                       skip=2, 
                       na.strings="<1")

allFileNames <- list.files("Data")
All <- data.frame(matrix(, nrow=0, ncol=3))
names(All) <- c("Date","Yahoo","Google")
for (filename in allFileNames) {
  fullFilename <- paste0("Data/",filename)
  Data <- read.csv(fullFilename,
                         skip=2, 
                         na.strings="<1")
  names(trendsData) <- c("Date","Yahoo","Google")
  All <- rbind(All,Data)
}

达里奥

编辑

如果这是一个经常运行的脚本，那么为了避免增加对象，应该真正提出一个替代方案：

假设Date在第一列中始终有一个命名列，而总共总共只有两列。

library(dplyr)
library(tidyr)
All <- bind_rows(sapply(allFileNames, function(x) {
    Data <- read.csv(filename,
               skip=2, 
               na.strings="<1",
               stringsAsFactors=FALSE)

    Data$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", filename)
    names(Data) <- c("Date", "values", "site")
    return(Data)
})) %>%
  pivot_wider(names_from=site,
            values_from=values)

All

较旧的答案：

allFileNames <- list.files("Data", full.names = TRUE)
All <- read.csv(allFileNames[1]),
                skip=2, 
                na.strings="<1",
                stringsAsFactors=FALSE)
All$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", allFileNames[1)
names(All) <- c("Date", "values", "site")

for (filename in allFileNames[-1]) {
  Data <- read.csv(filename,
                   skip=2, 
                   na.strings="<1",
                   stringsAsFactors=FALSE)
  
  Data$site <- gsub(".*[[:punct:]]([A-z]+)_.*", "\\1", filename)
  names(Data) <- c("Date", "values", "site")

  All <- rbind(All, Data)
}

library(dplyr)
library(tidyr)
All <- All %>%
  pivot_wider(names_from=site,
              values_from=values)
All

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。