我有一个包含20天用户社交媒体活动值的csv文件,我想获取第1天用户活动的详细信息,这是csv中条目的示例
DateTime Instagram Facebook Twitter
(2020,09,01,10,00,00) Y N Y
(2020,09,01,10,01,00) N Y Y
(2020,09,01,10,02,00) N Y N
(2020,09,01,10,03,00) N Y N
(2020,09,01,10,04,00) Y N Y
(2020,09,01,11,00,00) Y N N
(2020,09,02,10,00,00) N Y Y
(2020,09,02,10,00,00) Y N N
(2020,09,02,10,00,00) N N N
(2020,09,03,10,00,00) Y Y Y
Y代表用户处于活动状态,N代表用户处于非活动状态我想显示2020-09-01
第一天(2020年9月1日)所有应用程序的活动状态。
所以我希望结果看起来像这样(仅当用户在该应用上处于活动状态(Y)时的日期时间值)
{'Instagram':[(2020,09,01,10,00,00),(2020,09,01,10,04,00),(2020,09,01,11,00,00)],
'Facebook':[(2020,09,01,10,01,00), (2020,09,01,10,02,00), (2020,09,01,10,03,00)],
'Twitter':[(2020,09,01,10,00,00), (2020,09,01,10,01,00), (2020,09,01,10,04,00)]}
我写了一段代码,但是没有给我想要的结果
df['DateTime'] = pd.to_datetime(df['DateTime'], format='(%Y,%m,%d,%H,%M,%S)')
for idx, d in df.groupby(df['DateTime'].dt.date):
print(d.drop('DateTime', axis=1).to_dict('list'))
This was the result I got
{'Instagram': ['Y', 'N', 'N', 'N', 'Y', 'Y'], 'Facebook': ['N', 'Y', 'Y', 'Y', 'N', 'N'], 'Twitter': ['Y', 'Y', 'N', 'N', 'Y', 'N']}
{'Instagram': ['N', 'Y', 'N'], 'Facebook': ['Y', 'N', 'N'], 'Twitter': ['Y', 'N', 'N']}
{'Instagram': ['Y'], 'Facebook': ['Y'], 'Twitter': ['Y']}
DateTime列包含datetime对象格式的值,我将其转换为pandas datetime格式
将值转换为新列,按过滤第一个日期boolean indexing
,然后按DataFrame.melt
和取消过滤list
:
df['d'] = pd.to_datetime(df['DateTime'], format='(%Y,%m,%d,%H,%M,%S)')
day1 = df['d'].dt.date[0]
df = df[df['d'].dt.date.eq(day1)]
df = df.melt(['DateTime','d'])
df = df[df['value'].eq('Y')]
d = df.groupby('variable')['DateTime'].agg(list).to_dict()
print (d)
{'Facebook': ['(2020,09,01,10,01,00)', '(2020,09,01,10,02,00)', '(2020,09,01,10,03,00)'],
'Instagram': ['(2020,09,01,10,00,00)', '(2020,09,01,10,04,00)', '(2020,09,01,11,00,00)'],
'Twitter': ['(2020,09,01,10,00,00)', '(2020,09,01,10,01,00)', '(2020,09,01,10,04,00)']}
如果需要每个datetime
嵌套字典的输出:
df['d'] = pd.to_datetime(df['DateTime'], format='(%Y,%m,%d,%H,%M,%S)')
df = df.melt(['DateTime','d'])
df = df[df['value'].eq('Y')]
s = df.groupby([df['d'].dt.strftime('%Y-%m-%d'), 'variable'])['DateTime'].agg(list)
print (s)
d1 = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d1)
{'2020-09-01': {'Facebook': ['(2020,09,01,10,01,00)', '(2020,09,01,10,02,00)', '(2020,09,01,10,03,00)'],
'Instagram': ['(2020,09,01,10,00,00)', '(2020,09,01,10,04,00)', '(2020,09,01,11,00,00)'],
'Twitter': ['(2020,09,01,10,00,00)', '(2020,09,01,10,01,00)', '(2020,09,01,10,04,00)']},
'2020-09-02': {'Facebook': ['(2020,09,02,10,00,00)'],
'Instagram': ['(2020,09,02,10,00,00)'],
'Twitter': ['(2020,09,02,10,00,00)']},
'2020-09-03': {'Facebook': ['(2020,09,03,10,00,00)'],
'Instagram': ['(2020,09,03,10,00,00)'],
'Twitter': ['(2020,09,03,10,00,00)']}}
print (d1['2020-09-01'])
{'Facebook': ['(2020,09,01,10,01,00)', '(2020,09,01,10,02,00)', '(2020,09,01,10,03,00)'],
'Instagram': ['(2020,09,01,10,00,00)', '(2020,09,01,10,04,00)', '(2020,09,01,11,00,00)'],
'Twitter': ['(2020,09,01,10,00,00)', '(2020,09,01,10,01,00)', '(2020,09,01,10,04,00)']}
print (d1['2020-09-02'])
{'Facebook': ['(2020,09,02,10,00,00)'], 'Instagram': ['(2020,09,02,10,00,00)'], 'Twitter': ['(2020,09,02,10,00,00)']}
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句