我正在以csv格式处理数据集。观察数为“ 22255”,变量(列)数为“ 35”。该数据集包括三个包含日期的变量“ founded_at”,“ first_funding_at”和“ last_funding_at”。
这是数据集的示例:
founded_at first_funding_at last_funding_at
12/1/2005 5/3/2011 5/3/2011
1/1/2007 8/1/2007 3/8/2008
7/1/2007 3/1/2008 3/1/2009
9/1/2007 10/1/2009 8/1/2010
4/2/2009 1/1/2009 6/27/2014
1/1/2010 11/6/2013 11/6/2013
我想将所有这些日期拆分为“年”,“周”,“日”,“季度”,“周日”,“周年”,“日名”和“周末”。我尝试通过下面的代码执行此操作,但是它不起作用。我在pandas._libs.hashtable.PyObjectHashTable.get_item KeyError:“ feat”中收到此错误“文件“ pandas_libs \ hashtable_class_helper.pxi”,行1627。
#Import packeges
import pandas as pd
# Read the dataset
df = pd.read_csv("Sales dataset - Vijay.csv", engine='python', parse_dates['founded_at','first_funding_at','last_funding_at'])
list1 = ["founded_at", "first_funding_at","last_funding_at"]
for feat in list1:
#print (feat)
df['Year'] = df[feat].dt.year
df['Week'] = df[feat].dt.week
df['Day'] = df[feat].dt.day
df['quarter'] = df['feat'].dt.quarter
df['week_of_day'] = df['feat'].dt.dayofweek
df['year_of_week'] = df['feat'].dt.weekofyear
df['dayofweek_name'] = df['feat'].dt.day_name()
df['weekend'] = np.where(df['feat'].isin(['Sunday','Saturday']),1,0)
我确实需要您的帮助来解决此错误。
根据代码和错误,您使用的是字符串而不是循环变量:
for feat in list1:
#print (feat)
df['Year'] = df[feat].dt.year # feat variable - correct
df['Week'] = df[feat].dt.week
df['Day'] = df[feat].dt.day
df['quarter'] = df['feat'].dt.quarter # 'feat' string - wrong
df['week_of_day'] = df['feat'].dt.dayofweek
df['year_of_week'] = df['feat'].dt.weekofyear
df['dayofweek_name'] = df['feat'].dt.day_name()
df['weekend'] = np.where(df['feat'].isin(['Sunday','Saturday']),1,0)
看起来您正在覆盖循环中的相同列。
完整的工作代码:
ss = '''
founded_at first_funding_at last_funding_at
12/1/2005 5/3/2011 5/3/2011
1/1/2007 8/1/2007 3/8/2008
7/1/2007 3/1/2008 3/1/2009
9/1/2007 10/1/2009 8/1/2010
4/2/2009 1/1/2009 6/27/2014
1/1/2010 11/6/2013 11/6/2013
'''.strip()
with open('data.csv','w') as f: f.write(ss) # write test file
############### main script ################
import pandas as pd
import numpy as np
# Read the dataset
df = pd.read_csv("data.csv", engine='python', parse_dates=True, delim_whitespace=True)
list1 = ["founded_at", "first_funding_at","last_funding_at"]
for feat in list1:
df[feat] = pd.to_datetime(df[feat])
#print (feat)
df[feat + '_' + 'Year'] = df[feat].dt.year
df[feat + '_' + 'Week'] = df[feat].dt.week
df[feat + '_' + 'Day'] = df[feat].dt.day
df[feat + '_' + 'quarter'] = df[feat].dt.quarter
df[feat + '_' + 'week_of_day'] = df[feat].dt.dayofweek
df[feat + '_' + 'year_of_week'] = df[feat].dt.weekofyear
df[feat + '_' + 'dayofweek_name'] = df[feat].dt.day_name()
df[feat + '_' + 'weekend'] = np.where(df[feat].dt.day_name().isin(['Sunday','Saturday']),1,0)
print(df)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句