Having a list of dictionaries, e.g.:
[{'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2},
{'item_id':'string2','feature1': 0, 'feature2': 1, 'feature3':0},
{'item_id':'string3','feature1': 2, 'feature2': 0, 'feature3':1},
{'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2}]
I'd like to construct and update a DataFrame in which one of the columns captures an item_id, while the rest should be initiated and their values incrementally updated in case a repetition of an item_id (here 'string1') is detected.
The following:
import pandas as pd
list_of_dictionaries = [{'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2},
{'item_id':'string2','feature1': 0, 'feature2': 1, 'feature3':0},
{'item_id':'string3','feature1': 2, 'feature2': 0, 'feature3':1},
{'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2}]
header = ['item_id','feature1','feature2','feature3']
df = pd.DataFrame(columns=header)
for d in list_of_dictionaries:
df = pd.DataFrame.from_dict([d])
obviously only initializes the DataFrame.
Ideally, I'd like to sum up all the feature values for an 'item_id' that have more than 1 occurrence. For the example input 'list_of_dictionaries' this would be:
item_id feature1 feature2 feature3
0 string1 2 0 4
1 string2 0 1 0
2 string3 2 0 1
You can use DataFrame.groupby()
:
In [47]: df = pd.DataFrame.from_dict(list_of_dictionaries)
In [48]: df.groupby('item_id').sum()
Out[48]:
feature1 feature2 feature3
item_id
string1 2 0 4
string2 0 1 0
string3 2 0 1
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments