How to calculate unique values in pandas dataframe while grouping by values of particular column

Peter

I have a pandas DataFrame with columns ip, mac, hostname, and os. I want to convert that DataFrame DataFrame into a new DataFrame where the ip column is unique, and the other columns list all the unique values that appear with the given ip.

For example, a DataFrame as follows

ip mac hostname os
8.8.8.8 00:00:00:ff dns.google.com linux
8.8.8.8 00:00:ff:ff dns2.google.com windows
8.8.8.8 00:00:ff:ff dns2.google.com windows
8.8.4.4 00:00:00:ff dns.google.com linux
8.8.4.4 00:00:ff:ff dns2.google.com windows
8.8.4.4 00:00:ff:ff dns2.google.com windows

should be converted into the following

ip mac hostname os
8.8.8.8 00:00:00:ff, 00:00:ff:ff dns.google.com,dns2.google.com linux,windows
8.8.4.4 00:00:00:ff, 00:00:ff:ff dns.google.com,dns2.google.com linux,windows

I can accomplish the desired behavior by running

df.groupby('ip').agg(set)

but the data set is very large, and groupby is very memory intensive, so a 500MB dataset is consuming 3-4GB of memory. Is there an alternative way of doing this that is not so memory intensive?

Code for creating the input data:

import pandas as pd

df = pd.DataFrame(
    {
        'ip': ['8.8.8.8', '8.8.8.8', '8.8.8.8', '8.8.4.4', '8.8.4.4', '8.8.4.4'],
        'mac': ['00:00:00:ff', '00:00:ff:ff', '00:00:ff:ff', '00:00:00:ff', '00:00:ff:ff', '00:00:ff:ff'],
        'hostname': ['dns.google.com', 'dns2.google.com', 'dns2.google.com', 'dns.google.com', 'dns2.google.com', 'dns2.google.com',],
        'os': ['linux', 'windows', 'windows', 'linux', 'windows', 'windows'],
    }
)
Filip
df.groupby('ip').agg(lambda x: ', '.join(sorted(set(x))))

Joining the unique elements directly will reduce memory, as a single string will be stored in each cell instead of a set of strings.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to calculate ratio of values in a pandas dataframe column?

From Dev

How to filter a pandas dataframe by unique column values

From Dev

Grouping unique column values to get average of each unique value in pandas dataframe column

From Dev

How to find rows with column values having a particular datatype in a Pandas DATAFRAME

From Dev

How to sort unique values in of a particular coulmn in pandas?

From Dev

grouping values in pandas column

From Dev

How to create a pandas dataframe of unique values fetched from column with no duplicates

From Dev

How to sort values of a pandas dataframe by a particular column in a particular manner (using lambda function like sorted in std lib)

From Dev

Assign column values to unique row in pandas dataframe

From Dev

How to calculate unique rows with values in pandas

From Dev

How to make new dataframe columns from all the unique values in some particular column?

From Dev

Assigning value to pandas dataframe values for unique values in another column

From Dev

Pandas dataframe unique values

From Dev

How to calculate the percentage of values with grouping

From Dev

How to calculate max values in a dataframe column while removing duplicates in another column?

From Dev

Create multiple DataFrames from one pandas DataFrame by grouping by column values

From Dev

Grouping pandas dataframe by column specificity to row values - python

From Dev

pandas dataframe group by particular values

From Dev

How to calculate statistical values on Pandas dataframe?

From Dev

How to calculate with previous values in a Pandas MultiIndex DataFrame?

From Dev

Find index of all rows with null values in a particular column in pandas dataframe

From Dev

How to Select First N Key-ordered Values of column within a grouping variable in Pandas DataFrame

From Dev

How to replace Specific values of a particular column in Pandas Dataframe based on a certain condition?

From Dev

Creating Dictionary from Pandas DataFrame Column Based on Unique Values in Column

From Dev

How to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column

From Dev

How to map pandas Groupby dataframe with sum values to another dataframe using non-unique column

From Dev

How to make label column in pandas dataframe while grouping by counts?

From Dev

How to solve ValueError while checking rows in a particular column in pandas dataframe?

From Dev

Python pandas grouping a dataframe by the unique value of a column

Related Related

  1. 1

    How to calculate ratio of values in a pandas dataframe column?

  2. 2

    How to filter a pandas dataframe by unique column values

  3. 3

    Grouping unique column values to get average of each unique value in pandas dataframe column

  4. 4

    How to find rows with column values having a particular datatype in a Pandas DATAFRAME

  5. 5

    How to sort unique values in of a particular coulmn in pandas?

  6. 6

    grouping values in pandas column

  7. 7

    How to create a pandas dataframe of unique values fetched from column with no duplicates

  8. 8

    How to sort values of a pandas dataframe by a particular column in a particular manner (using lambda function like sorted in std lib)

  9. 9

    Assign column values to unique row in pandas dataframe

  10. 10

    How to calculate unique rows with values in pandas

  11. 11

    How to make new dataframe columns from all the unique values in some particular column?

  12. 12

    Assigning value to pandas dataframe values for unique values in another column

  13. 13

    Pandas dataframe unique values

  14. 14

    How to calculate the percentage of values with grouping

  15. 15

    How to calculate max values in a dataframe column while removing duplicates in another column?

  16. 16

    Create multiple DataFrames from one pandas DataFrame by grouping by column values

  17. 17

    Grouping pandas dataframe by column specificity to row values - python

  18. 18

    pandas dataframe group by particular values

  19. 19

    How to calculate statistical values on Pandas dataframe?

  20. 20

    How to calculate with previous values in a Pandas MultiIndex DataFrame?

  21. 21

    Find index of all rows with null values in a particular column in pandas dataframe

  22. 22

    How to Select First N Key-ordered Values of column within a grouping variable in Pandas DataFrame

  23. 23

    How to replace Specific values of a particular column in Pandas Dataframe based on a certain condition?

  24. 24

    Creating Dictionary from Pandas DataFrame Column Based on Unique Values in Column

  25. 25

    How to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column

  26. 26

    How to map pandas Groupby dataframe with sum values to another dataframe using non-unique column

  27. 27

    How to make label column in pandas dataframe while grouping by counts?

  28. 28

    How to solve ValueError while checking rows in a particular column in pandas dataframe?

  29. 29

    Python pandas grouping a dataframe by the unique value of a column

HotTag

Archive