好的,所以我在下面的链接中找到了我需要的部分答案,只要我的csv文件采用2015-03-01,1,2,3,1,3
第一列的格式,它就可以正常工作。当第一列更改为时,如何保持此工作2015-03-01 00:00:00.000
import csv
from itertools import groupby
for key, rows in groupby(csv.reader(open("largeFile.csv", "r", encoding='utf-16')),
lambda row: row[0]):
with open("%s.txt" % key, "w") as output:
for row in rows:
output.write(",".join(row) + "\n")
所以我有一个大文件,其中大约有170万行...
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.01,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.02,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.03,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015.01.03,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
该程序确实每天都在创建一个新的文本文档,这真是太好了!
但是,当列如下时,它将停止工作。
2015-03-01 00:00:01.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015-03-01 00:00:02.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015-03-02 00:00:01.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015-03-02 00:00:02.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015-03-02 00:00:03.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015-03-03 00:00:01.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
2015-03-03 00:00:02.000,NULL,NULL,NULL,NULL,NULL,0,1,0,1,0,0,0,1
它给了我以下错误。
追溯(最近一次通话):文件“ C:\ Python34 \ Proj \ documents \ New folder \ dataPullSplit2.py”,第6行,带有open(“%s.txt”%key,“ w”)作为输出: OSError:[Errno 22]无效的参数:'2015-03-01 00:00:00.000.txt'
有人可以在这里向我指出正确的方向。
Found Temp Solution
好的,因此通过将其从“ w”更改为“ a”,我现在将其附加到文件上,并使用key[:-13]
i能够切断文件名上的时间戳记...它可以工作,但是速度很慢。 ..我该如何改善并理解为什么进展如此缓慢?
这是现在的代码
import csv
from itertools import groupby
for key, rows in groupby(csv.reader(open("asdf2.txt", "r", encoding='utf-16')),
lambda row: row[0]):
with open("%s.txt" % key[:-13], "a") as output:
for row in rows:
output.write(",".join(row) + "\n")
假设您的文件应保留该模式2015.01.01
,则清理key
应当工作:
key = key.split()[0].replace('-', '.')
完整代码:
import csv
from itertools import groupby
def shorten_key(key):
return key.split()[0].replace('-', '.')
for key, rows in groupby(csv.reader(open("asdf2.txt", "r", encoding='utf-16')),
lambda row: shorten_key(row[0])):
with open("%s.txt" % shorten_key(key), "a") as output:
for row in rows:
output.write(",".join(row) + "\n")
快速测试:
keys = ['2015-03-01 00:00:02.000', '2015.01.01']
for key in keys:
print(key.split()[0].replace('-', '.'))
输出:
2015.03.01
2015.01.01
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句