将.dat和.npy加载到Python中

Vlad Balanescu 发表于 Dev

弗拉德·巴拉内斯库（Vlad Balanescu）

如何从.dat文件中读取和存储8D数组在Python中？我的二进制文件看起来像这样。我希望每个字符串都是一行

['r 11 1602 24 1622 0\n', 'i 26 1602 36 1631 0\n', 
'v 37 1602 57 1621 0\n', 'e 59 1602 76 1622 0\n', 
'r 77 1602 91 1622 1\n', 'h 106 1602 127 1631 0\n', 
'e 127 1602 144 1622 1\n', 'h 160 1602 181 1631 0\n',
'e 181 1602 198 1622 0\n', 'a 200 1602 218 1622 0\n',
'r 218 1602 232 1622 0\n', 'd 234 1602 254 1631 1\n',
't 268 1602 280 1627 0\n', 'h 280 1602 301 1631 0\n',
'e 302 1602 319 1622 1\n', 'd 335 1602 355 1631 0\n']

当我尝试这个：

file1 = open('data/train1.dat', 'rb')
train1_dat = np.loadtxt(file1.readlines(), delimiter=',')  
print train1_dat

我得到这个错误

ValueError: could not convert string to float: r 11 1602 24 1622 0

代达罗斯

假设您的.dat文件与您的问题完全相同，我们首先创建一个模仿此格式的数据字符串。我们将其读取为数据字符串，然后将其压缩为适合装入numpy的格式

from StringIO import StringIO

d = StringIO("""['r 11 1602 24 1622 0\n', 'i 26 1602 36 1631 0\n', 
'v 37 1602 57 1621 0\n', 'e 59 1602 76 1622 0\n', 
'r 77 1602 91 1622 1\n', 'h 106 1602 127 1631 0\n', 
'e 127 1602 144 1622 1\n', 'h 160 1602 181 1631 0\n',
'e 181 1602 198 1622 0\n', 'a 200 1602 218 1622 0\n',
'r 218 1602 232 1622 0\n', 'd 234 1602 254 1631 1\n',
't 268 1602 280 1627 0\n', 'h 280 1602 301 1631 0\n',
'e 302 1602 319 1622 1\n', 'd 335 1602 355 1631 0\n'] """)

data = d.read()  # read contents of .dat file
data = data.strip()  # remove trailing newline
data = data.replace('\n', '')  # remove all newlines
data = data.replace("', '", "','")  # clean up separators
data = data[2:-2]  # remove leading and trailing delimiters
data = data.split("','")  # convert into a clean list
data = '\n'.join(data)  # re-combine into a string to load into numpy

print(data)  # have a look at the new string format

生成的.dat字符串如下所示：

r 11 1602 24 1622 0
i 26 1602 36 1631 0
v 37 1602 57 1621 0
e 59 1602 76 1622 0
r 77 1602 91 1622 1
h 106 1602 127 1631 0
e 127 1602 144 1622 1
h 160 1602 181 1631 0
e 181 1602 198 1622 0
a 200 1602 218 1622 0
r 218 1602 232 1622 0
d 234 1602 254 1631 1
t 268 1602 280 1627 0
h 280 1602 301 1631 0
e 302 1602 319 1622 1
d 335 1602 355 1631 0

愚蠢的脚注：第一列似乎很杂技，我觉得很有趣：“他听到d ...的河了”，最后一列中的1标志着每个单词的结尾:-)无论如何，与我无关。

更严重的是，如果您可以从头开始以这种格式设置.dat文件，那么上述所有步骤都是不必要的。现在，我们可以轻松导入到numpy数组中了：

import numpy as np

d = StringIO(data)
# The column names 'a' to 'f' are arbitrary 
# and can be changed to suit
# also the numbers are all arbitrarily imported as floats
data = np.loadtxt(d, dtype={'names': ('a', 'b', 'c', 'd', 'e', 'f'),
                            'formats': ('S1', 'f', 'f', 'f', 'f', 'f')})
print(data)

结果如下：

[('r', 11.0, 1602.0, 24.0, 1622.0, 0.0)
 ('i', 26.0, 1602.0, 36.0, 1631.0, 0.0)
 ('v', 37.0, 1602.0, 57.0, 1621.0, 0.0)
 ('e', 59.0, 1602.0, 76.0, 1622.0, 0.0)
 ('r', 77.0, 1602.0, 91.0, 1622.0, 1.0)
 ('h', 106.0, 1602.0, 127.0, 1631.0, 0.0)
 ('e', 127.0, 1602.0, 144.0, 1622.0, 1.0)
 ('h', 160.0, 1602.0, 181.0, 1631.0, 0.0)
 ('e', 181.0, 1602.0, 198.0, 1622.0, 0.0)
 ('a', 200.0, 1602.0, 218.0, 1622.0, 0.0)
 ('r', 218.0, 1602.0, 232.0, 1622.0, 0.0)
 ('d', 234.0, 1602.0, 254.0, 1631.0, 1.0)
 ('t', 268.0, 1602.0, 280.0, 1627.0, 0.0)
 ('h', 280.0, 1602.0, 301.0, 1631.0, 0.0)
 ('e', 302.0, 1602.0, 319.0, 1622.0, 1.0)
 ('d', 335.0, 1602.0, 355.0, 1631.0, 0.0)]