我正在尝试将多个CSV文件合并为我的数据集的单个大CSV。我正在寻找的是从多个CVS文件中获取一些列数据,并从中获取数据集。我不希望最终数据集中的所有列都只有少数选定的列。我names
在读取CSV时已在panda中使用过属性,但返回的效果很好,但是我无法从读取的CSV中创建新的CSV。我在这里做错了什么?我在底部添加了堆栈跟踪。
import glob
import pandas as pd
import os
import time
from datetime import datetime
import numpy as np
path = "C:\Users\lenovo\Downloads\Compressed\LoanStats3a.csv_2\csv"
class MergeCsvFiles:
def MergeCsv(self):
allFiles = glob.glob(os.path.join(path, "LoanStats3a.csv"))
print 'allFiles',allFiles
for file_ in allFiles:
print 'file_ ######### ',file_
# merge_df = pd.DataFrame.from_csv(file_)
# print merge_df
fileToSave = glob.glob(os.path.join(path, "merge.csv"))
print 'filrToSave #### ', fileToSave
np_array_list = []
df = pd.read_csv(file_, skipinitialspace=True,low_memory=False,header=0,index_col=None)
np_array_list.append(df.as_matrix())
comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)
# big_frame.columns = fields
print 'big_frame#### ', big_frame
big_frame.to_csv(fileToSave)
# See the keys
print 'df.keys########',df.keys()
print 'df @@@@@', df
frame = pd.DataFrame()
list_ = []
list_.append(df)
frame = pd.concat(list_)
# print 'frame#### ',frame
frame.to_csv(fileToSave)
if __name__ == "__main__":
s = MergeCsvFiles()
s.MergeCsv()
堆栈跟踪 :
Traceback (most recent call last):
File "C:/Users/lenovo/Downloads/Video/Machine Learning/MLPredictiveAnalysis/MergeCsv.py", line 59, in <module>
s.MergeCsv()
File "C:/Users/lenovo/Downloads/Video/Machine Learning/MLPredictiveAnalysis/MergeCsv.py", line 39, in MergeCsv
big_frame.to_csv(fileToSave)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1344, in to_csv
formatter.save()
File "C:\Python27\lib\site-packages\pandas\formats\format.py", line 1526, in save
compression=self.compression)
File "C:\Python27\lib\site-packages\pandas\io\common.py", line 426, in _get_handle
f = open(path, mode)
TypeError: coercing to Unicode: need string or buffer, list found
glob.glob
返回列表。您需要将路径名称的字符串传递给big_frame.csv。为什么你甚至需要水珠?big_frame.csv(os.path.join(path, "merge.csv"))
应该管用。
您还将frame.to_csv(fileToSave)
在循环的底部将这个文件写满。而且每次迭代都将覆盖文件,因此只有最后一次迭代才能保存任何文件。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句