我想打开csv文件以供阅读。但是我正面临一些例外。
我正在使用Python 2.7。
main.python-
if __name__ == "__main__":
f = open('input.csv','r+b')
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
reader = csv.DictReader(iter(m.readline, ""))
for read in reader:
num = read['time']
print num
输出-
Traceback (most recent call last):
File "/home/PycharmProjects/time_gap_Task/main.py", line 22, in <module>
for read in reader:
File "/usr/lib/python3.4/csv.py", line 109, in __next__
self.fieldnames
File "/usr/lib/python3.4/csv.py", line 96, in fieldnames
self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
如何解决这个错误?以及如何以良好的方式使用mmap和csv打开csv文件,以使代码运行完美?
我知道您之前曾问过这个问题,但实际上我为自己创建了一个模块来执行此操作,因为我对大型CSV文件做了很多工作,有时我需要根据密钥将它们转换为字典。下面是我一直在使用的代码。请随时根据需要进行修改。
def MmapCsvFileIntoDict(csvFilePath, skipHeader = True, transform = lambda row: row, keySelector = lambda o: o):
"""
Takes a CSV file path and uses mmap to open the file and return a dictionary of the contents keyed
on the results of the keySelector. The default key is the transformed object itself. Mmap is used because it is
a more efficient way to process large files.
The transform method is used to convert the line (converted into a list) into something else. Hence 'transform'.
If you don't pass it in, the transform returns the list itself.
"""
contents = {}
firstline = False
try:
with open(csvFilePath, "r+b") as f:
# memory-map the file, size 0 means whole file
mm = mmap.mmap(f.fileno(), 0)
for line in iter(mm.readline, b''):
if firstline == False:
firstline = True
if skipHeader == True:
continue
row = ''
line = line.decode('utf-8')
line = line.strip()
row = next(csv.reader([line]), '')
if transform != None and callable(transform):
if row == None or row == '':
continue
value = transform(row)
else:
value = row
if callable(keySelector):
key = keySelector(value)
else:
key = keySelector
contents[key] = value
except IOError as ie:
PrintWithTs('Error decomposing the companies: {0}'.format(ie))
return {}
except:
raise
return contents
调用此方法时,有一些选择。
假设您有一个看起来像这样的文件:
Id, Name, PhoneNumber
1, Joe, 7175551212
2, Mary, 4125551212
3, Vince, 2155551212
4, Jane, 8145551212
调用它的最简单方法是这样的:
dict = MmapCsvFileIntoDict('/path/to/file.csv', keySelector = lambda row: row[0])
您得到的是一个看起来像这样的字典:
{ '1' : ['1', 'Joe', '7175551212'], '2' : ['2', 'Mary', '4125551212'] ...
我想做的一件事是创建一个类或一个命名元组来表示我的数据:
class CsvData:
def __init__(self, row):
self.Id = int(row[0])
self.Name = row[1].upper()
self.Phone = int(row[2])
然后,当我调用该方法时,我传入第二个lambda来将文件中的每一行转换为可以使用的对象:
dict = MmapCsvFileIntoDict('/path/to/file.csv', transform = lambda row: CsvData(row), keySelector = lambda o: o.Id)
那个时候我回来的样子是:
{ 1 : <object instance>, 2 : <object instance>...
我希望这有帮助!祝你好运
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句