如何在R中读取卡住的数据?

乔恩

我有一些数字数据,以空格分隔。我尝试使用read.table在R中读取它,但是行有一些问题,缺少空格分隔符。很多变量都粘在一起。如何正确读取该数据?我试图更改一些read.table参数,但这还不够。

原始数据在这里:https : //dl.dropboxusercontent.com/u/74190377/data.txt

样本数据如下所示:

structure(list(id = c("60019660101", "60019660102", "60019660103", 
"60019660104", "60019660105", "60019660106", "60019660107", "60019660108", 
"60019660109", "60019660110", "60019660111", "60019660112", "60019660113", 
"60019660114", "60019660115", "60019660116", "60019660117", "60019660118", 
"60019660119-10.6-12.4-11.9-11.6"), name1 = c("4.3", "7.4", "5.8", 
"4.3", "-3.5-12.9", "-6.6-13.3", "-5.7", "-5.0-11.4", "-7.5-12.0", 
"-8.8-15.3-11.5-19.5", "-9.8-16.4-13.1-22.3", "-8.9-17.4-10.9-20.0", 
"-7.3", "-5.8-10.5", "-5.4-13.6", "-9.4-20.4-14.4-26.3", "-7.9-15.6-10.3-19.4", 
"-8.7-11.2-10.5-16.0", "1.3"), name2 = c(".7", "3.8", "3.0", 
"-4.1", "-8.6", "-8.6-16.3", "-7.5", "-8.9-11.0", "-9.6-17.6", 
".0", ".6", "2.4", "-9.2", "-6.9", "-8.3", ".0", "1.2", ".8", 
"34-99.0"), name3 = c("3.4", "5.5", "4.2", "-1.9", "-5.6", "6.1", 
"-6.6", "1.8", "1.6", "20-99.0", "18", "17-99.0", "-8.5", "-8.0", 
"-9.1", "33", "33-99.0", "34-99.0", "-.9"), name4 = c("1.0", 
"1.9", "1.8", "-2.4", "1.5", "21-99.0", "-7.9", "25-99.0", "27-99.0", 
"-.9", "1.5", "-.9", "-9.1", "6.1", ".1", "4.6", "-.9", "-.9", 
"-.9"), name5 = c("1.0", "1.6", "10.9", "7.2", "17-99.0", "-.9", 
"1.0", "-.9", "-.9", "-.9", "-.9", "-.9", "2.4", "25-99.0", "33-99.0", 
"-.9", "-.9", "-.9", "-.9"), name6 = c("-9", "-9", "-9", "7-99.0", 
"-.9", "-.9", "27-99.0", "-.9", "-.9", "-.9", "-.9", "-.9", "20-99.0", 
"-.9", "-.9", "-.9", "-.9", "-.9", "-.9"), name7 = c(3.1, 3.7, 
2.7, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, 
-0.9, -0.9, -0.9, -0.9, -0.9, -0.9), name8 = c(-0.9, -0.9, -0.9, 
-0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, 
-0.9, -0.9, -0.9, -0.9, NA), name9 = c(-0.9, -0.9, -0.9, -0.9, 
-0.9, -0.9, -0.9, -0.9, -0.9, NA, -0.9, NA, -0.9, -0.9, -0.9, 
-0.9, NA, NA, NA), name10 = c(-0.9, -0.9, -0.9, -0.9, -0.9, NA, 
-0.9, NA, NA, NA, NA, NA, -0.9, -0.9, -0.9, NA, NA, NA, NA), 
    name11 = c(9.6, 7.8, 9, -0.9, NA, NA, -0.9, NA, NA, NA, NA, 
    NA, -0.9, NA, NA, NA, NA, NA, NA), name12 = c(-0.9, -0.9, 
    -0.9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA)), .Names = c("id", "name1", "name2", "name3", 
"name4", "name5", "name6", "name7", "name8", "name9", "name10", 
"name11", "name12"), class = "data.frame", row.names = c(NA, 
-19L))

这是我的(不良)输出:

                                id               name1     name2   name3   name4   name5   name6 name7 name8 name9 name10 name11 name12
1                      60019660101                 4.3        .7     3.4     1.0     1.0      -9   3.1  -0.9  -0.9   -0.9    9.6   -0.9
2                      60019660102                 7.4       3.8     5.5     1.9     1.6      -9   3.7  -0.9  -0.9   -0.9    7.8   -0.9
3                      60019660103                 5.8       3.0     4.2     1.8    10.9      -9   2.7  -0.9  -0.9   -0.9    9.0   -0.9
4                      60019660104                 4.3      -4.1    -1.9    -2.4     7.2  7-99.0  -0.9  -0.9  -0.9   -0.9   -0.9     NA
5                      60019660105           -3.5-12.9      -8.6    -5.6     1.5 17-99.0     -.9  -0.9  -0.9  -0.9   -0.9     NA     NA
6                      60019660106           -6.6-13.3 -8.6-16.3     6.1 21-99.0     -.9     -.9  -0.9  -0.9  -0.9     NA     NA     NA
7                      60019660107                -5.7      -7.5    -6.6    -7.9     1.0 27-99.0  -0.9  -0.9  -0.9   -0.9   -0.9     NA
8                      60019660108           -5.0-11.4 -8.9-11.0     1.8 25-99.0     -.9     -.9  -0.9  -0.9  -0.9     NA     NA     NA
9                      60019660109           -7.5-12.0 -9.6-17.6     1.6 27-99.0     -.9     -.9  -0.9  -0.9  -0.9     NA     NA     NA
10                     60019660110 -8.8-15.3-11.5-19.5        .0 20-99.0     -.9     -.9     -.9  -0.9  -0.9    NA     NA     NA     NA
11                     60019660111 -9.8-16.4-13.1-22.3        .6      18     1.5     -.9     -.9  -0.9  -0.9  -0.9     NA     NA     NA
12                     60019660112 -8.9-17.4-10.9-20.0       2.4 17-99.0     -.9     -.9     -.9  -0.9  -0.9    NA     NA     NA     NA
13                     60019660113                -7.3      -9.2    -8.5    -9.1     2.4 20-99.0  -0.9  -0.9  -0.9   -0.9   -0.9     NA
14                     60019660114           -5.8-10.5      -6.9    -8.0     6.1 25-99.0     -.9  -0.9  -0.9  -0.9   -0.9     NA     NA
15                     60019660115           -5.4-13.6      -8.3    -9.1      .1 33-99.0     -.9  -0.9  -0.9  -0.9   -0.9     NA     NA
16                     60019660116 -9.4-20.4-14.4-26.3        .0      33     4.6     -.9     -.9  -0.9  -0.9  -0.9     NA     NA     NA
17                     60019660117 -7.9-15.6-10.3-19.4       1.2 33-99.0     -.9     -.9     -.9  -0.9  -0.9    NA     NA     NA     NA
18                     60019660118 -8.7-11.2-10.5-16.0        .8 34-99.0     -.9     -.9     -.9  -0.9  -0.9    NA     NA     NA     NA
19 60019660119-10.6-12.4-11.9-11.6                 1.3   34-99.0     -.9     -.9     -.9     -.9  -0.9    NA    NA     NA     NA     NA

这是正确数据的外观:

  60019660101  4.3    .7     3.4     1.0    1.0   -9     3.1    -.9  -.9  -.9  9.6  -.9
  60019660102  7.4   3.8     5.5     1.9    1.6   -9     3.7    -.9  -.9  -.9  7.8  -.9
  60019660103  5.8   3.0     4.2     1.8    10.9  -9     2.7    -.9  -.9  -.9  9.0  -.9
  60019660104  4.3  -4.1    -1.9    -2.4    7.2      7  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660105 -3.5  -12.9   -8.6    -5.6    1.5     17  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660106 -6.6  -13.3   -8.6    -16.3   6.1     21  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660107 -5.7  -7.5    -6.6    -7.9    1.0     27  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660108 -5.0  -11.4   -8.9    -11.0   1.8     25  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660109 -7.5  -12.0   -9.6    -17.6   1.6     27  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660110 -8.8  -15.3   -11.5   -19.5    .0     20  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660111 -9.8  -16.4   -13.1   -22.3    .6     18    1.5   -.9  -.9  -.9  -.9  -.9
  60019660112 -8.9  -17.4   -10.9   -20.0   2.4     17  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660113 -7.3  -9.2    -8.5    -9.1    2.4     20  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660114 -5.8  -10.5   -6.9    -8.0    6.1     25  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660115 -5.4  -13.6   -8.3    -9.1     .1     33  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660116 -9.4  -20.4   -14.4   -26.3    .0     33    4.6   -.9  -.9  -.9  -.9  -.9
  60019660117 -7.9  -15.6   -10.3   -19.4   1.2     33  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660118 -8.7  -11.2   -10.5   -16.0    .8     34  -99.0   -.9  -.9  -.9  -.9  -.9
  60019660119 -10.6 -12.4   -11.9   -11.6   1.3     34  -99.0   -.9  -.9  -.9  -.9  -.9
罗兰

您似乎具有固定宽度的格式化数据。

read.fwf("https://dl.dropboxusercontent.com/u/74190377/data.txt",
         widths=c(13,5,5,5,5,7,4,5,5,5,5,5,5))

#            V1    V2    V3    V4    V5   V6 V7    V8   V9  V10  V11  V12  V13
#1  60019660101   4.3   0.7   3.4   1.0  1.0 -9   3.1 -0.9 -0.9 -0.9  9.6 -0.9
#2  60019660102   7.4   3.8   5.5   1.9  1.6 -9   3.7 -0.9 -0.9 -0.9  7.8 -0.9
#3  60019660103   5.8   3.0   4.2   1.8 10.9 -9   2.7 -0.9 -0.9 -0.9  9.0 -0.9
#4  60019660104   4.3  -4.1  -1.9  -2.4  7.2  7 -99.0 -0.9 -0.9 -0.9 -0.9 -0.9
#5  60019660105  -3.5 -12.9  -8.6  -5.6  1.5 17 -99.0 -0.9 -0.9 -0.9 -0.9 -0.9
<snip>

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

如何在R中读取GeoJSONP数据?

来自分类Dev

如何在R中读取大型数据集的子集?

来自分类Dev

如何在R中读取MNIST数据库?

来自分类Dev

ggplot如何在R中垂直而不是水平“读取”数据

来自分类Dev

如何在R中读取数据帧的setdiff?

来自分类Dev

如何在Django中读取json数据?

来自分类Dev

如何在Perl中从XLSX读取数据

来自分类Dev

如何在Swift中从PLIST读取数据?

来自分类Dev

如何在Swift中从PLIST读取数据?

来自分类Dev

如何在vtkDataArray中读取数据?

来自分类Dev

如何在Perl中从XLSX读取数据

来自分类Dev

如何在Express中读取jsonp数据

来自分类Dev

如何在Python中从Json读取数据

来自分类Dev

如何在 VueJS 中读取 Firebase 数据?

来自分类Dev

如何从R中读取PDF元数据

来自分类Dev

如何从R中的串行端口读取数据

来自分类Dev

如何在R中读取单行CSV?

来自分类Dev

如何在R中读取.edges文件?

来自分类Dev

如何在R中读取rsav文件

来自分类Dev

如何在R中读取多个文件

来自分类Dev

如何在R中读取文本文件并创建数据框

来自分类Dev

如何在R中读取多个.xlsx并生成多个数据帧?

来自分类Dev

如何在R中读取多个文件并从中创建单个数据帧?

来自分类Dev

如何在R中生成的数据框中读取指定的列?

来自分类Dev

如何在jQuery + wordpress中读取多行的json数据

来自分类Dev

如何在C ++中从UTexture2D读取数据

来自分类Dev

如何在Drupal 7中读取JSON数据

来自分类Dev

我如何在AspNet.Core中读取EXIF数据

来自分类Dev

如何在Java中从JPEG读取XMP面部数据