How to increment Python Pandas DataFrame based on key/values from a dictionary

user3661230 Published at Dev

user3661230

Having a list of dictionaries, e.g.:

[{'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2},
 {'item_id':'string2','feature1': 0, 'feature2': 1, 'feature3':0},
 {'item_id':'string3','feature1': 2, 'feature2': 0, 'feature3':1},
 {'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2}]

I'd like to construct and update a DataFrame in which one of the columns captures an item_id, while the rest should be initiated and their values incrementally updated in case a repetition of an item_id (here 'string1') is detected.

The following:

import pandas as pd

list_of_dictionaries = [{'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2},
     {'item_id':'string2','feature1': 0, 'feature2': 1, 'feature3':0},
     {'item_id':'string3','feature1': 2, 'feature2': 0, 'feature3':1},
     {'item_id':'string1','feature1': 1, 'feature2': 0, 'feature3':2}]


header = ['item_id','feature1','feature2','feature3']
df = pd.DataFrame(columns=header)

for d in list_of_dictionaries:
    df = pd.DataFrame.from_dict([d])

obviously only initializes the DataFrame.

Ideally, I'd like to sum up all the feature values for an 'item_id' that have more than 1 occurrence. For the example input 'list_of_dictionaries' this would be:

   item_id  feature1  feature2  feature3
0  string1         2         0         4
1  string2         0         1         0
2  string3         2         0         1

Sait

You can use DataFrame.groupby():

In [47]: df = pd.DataFrame.from_dict(list_of_dictionaries)

In [48]: df.groupby('item_id').sum()
Out[48]:
         feature1  feature2  feature3
item_id
string1         2         0         4
string2         0         1         0
string3         2         0         1

Collected from the Internet

Please contact [email protected] to delete if infringement.