Python - Loop though dataframe and create class objects

hackmyacc

I have the following dataframe (already processed and cleaned to remove special chars, etc.).

parent_id members_id item_id item_name
par_100 member1 item1 t shirt
par_100 member1 item2 denims
par_102 member2 item3 shirt
par_103 member3 item4 shorts
par_103 member3 item5 blouse
par_103 member4 item6 sweater
par_103 member4 item7 hoodie

and following class structure

class Member:
    
    def __init__(self, id):
        self.member_id = id
        self.items = []
        
class Item:
    
    def __init__(self, id, name):
        self.item_id = id
        self.name = name

The number of rows in the dataframe is around 500K+ . I want to create a dictionary (or other structure) where "parent_id" is the primary key and the columns are mapped to the class objects. After creating the specified data structure. I will be performing some actions based on some business logic where I will have to loop through all the members.

First action is to create the data structure from dataframe. I have following code which does the job but it takes around 3 hours to process all the 500k+ rows.

# sorted_data is the dataframe mentioned above
parent_key_list = sorted_data['parent_id'].unique().tolist()
    
    for index, parent_key in enumerate(parent_key_list):
    
        temp_data = sorted_data.loc[sorted_data['parent_id'] == parent_key]
        unique_members = temp_data["members_id"].unique()
    
        for us in unique_members:
            items = temp_data.loc[temp_data['members_id'] == us] 
           
            temp_member = Member(items[0]["members_id"])
    
            for index, row in items.iterrows():
                temp_member.items.append(Item(row["item_id"], row["item_name"]))
    
        parent_dict[parent_key].append(temp_member)

Since .loc is very time expensive operation, I tried the same thing with numpy arrays but the performance was much worse. Is there a better approach to reduce the processing time?

scespinoza

Try this:

from collections import defaultdict

parent_dict = defaultdict(lambda: [])

for (parent_id, members_id), sdf in sorted_data.groupby(['parent_id', 'members_id']):
    member = Member(members_id)
    items = sdf.apply(lambda r: Item(r.item_id, r.item_name), axis=1).to_list()
    member.items.extend(items)
    parent_dict[parent_id].append(member)

It makes use of the .groupby function to partition the dataset for each member. Then you can create the item objects using .apply on the subdataframes generated by .groupby and convert it to a list if Item objects that you can then use to update each member items attribute. Resulting members are stored in a defaultdict that you can convert back to a normal one using dict() (althought they works exactly the same).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

How to create multiple class objects with a loop in python?

From Dev

Creating multiple class objects with a for loop in python

From Dev

How to add Objects to Class with a loop in Python?

From Dev

Python: Create 50 objects using a for loop

From Dev

Create objects of custom class from YAML in Python

From Dev

Insert values of dataframe to class atributes with for loop in python

From Dev

Create multiple dataframe using for loop in python 2.7

From Dev

how to create a dataframe from a nested for loop in python?

From Dev

Use a for-loop to create a new DataFrame in Python?

From Dev

How do I dynamically create class objects in a loop?

From Dev

How to create and delete class instance in python for loop

From Dev

how to create a dataframe class in pandas/python

From Dev

Create dataframe of multiple appended arrays using for loop or nested loop in python

From Dev

How do I create objects of another class in a different class in python

From Dev

create multiple objects in a for loop

From Dev

Groupby in DataFrame with class objects

From Dev

how can i make pandas dataframe objects using for loop in python

From Dev

How can I define multiple class objects with a for loop in python?

From Dev

Can't call objects from class in loop Python Django

From Java

Create independent objects of a class

From Java

Create Class Store Objects

From Dev

Create dataframe from loop

From Dev

create a DataFrame from for loop

From Dev

Python - create objects of a class without repeating myself while creating

From Dev

Create Object of a Class based on another Objects Attributes in Python

From Dev

How to automatically create all child objects of a parent class in python?

From Dev

How to loop through a dataframe, create a new column and append values to it in python

From Dev

how to create a new column in a dataframe using a loop in python

From Dev

In Python, How do you use a loop to create a dataframe?

Related Related

  1. 1

    How to create multiple class objects with a loop in python?

  2. 2

    Creating multiple class objects with a for loop in python

  3. 3

    How to add Objects to Class with a loop in Python?

  4. 4

    Python: Create 50 objects using a for loop

  5. 5

    Create objects of custom class from YAML in Python

  6. 6

    Insert values of dataframe to class atributes with for loop in python

  7. 7

    Create multiple dataframe using for loop in python 2.7

  8. 8

    how to create a dataframe from a nested for loop in python?

  9. 9

    Use a for-loop to create a new DataFrame in Python?

  10. 10

    How do I dynamically create class objects in a loop?

  11. 11

    How to create and delete class instance in python for loop

  12. 12

    how to create a dataframe class in pandas/python

  13. 13

    Create dataframe of multiple appended arrays using for loop or nested loop in python

  14. 14

    How do I create objects of another class in a different class in python

  15. 15

    create multiple objects in a for loop

  16. 16

    Groupby in DataFrame with class objects

  17. 17

    how can i make pandas dataframe objects using for loop in python

  18. 18

    How can I define multiple class objects with a for loop in python?

  19. 19

    Can't call objects from class in loop Python Django

  20. 20

    Create independent objects of a class

  21. 21

    Create Class Store Objects

  22. 22

    Create dataframe from loop

  23. 23

    create a DataFrame from for loop

  24. 24

    Python - create objects of a class without repeating myself while creating

  25. 25

    Create Object of a Class based on another Objects Attributes in Python

  26. 26

    How to automatically create all child objects of a parent class in python?

  27. 27

    How to loop through a dataframe, create a new column and append values to it in python

  28. 28

    how to create a new column in a dataframe using a loop in python

  29. 29

    In Python, How do you use a loop to create a dataframe?

HotTag

Archive