Exception during groupby pandas

MaxP Published at Dev

maxp

I am just beginning to learn analytics with python for network analysis using the Python For Data Analysis book and I'm getting confused by an exception I get while doing some groupby's... here's my situation.

I have a CSV of NetFlow data that I've imported to pandas. The data looks something like:

dt, srcIP, srcPort, dstIP, dstPort, bytes
2013-06-06 00:00:01.123, 123.123.1.1, 12345, 234.234.1.1, 80, 75

I've imported and indexed the data as follows:

df = pd.read_csv('mycsv.csv')
df.index = pd.to_datetime(full_set.pop('dt'))

What I want is a count of unique srcIPs which visit my servers per time period (I have data over several days and I'd like time period by date,hour). I can obtain an overall traffic graph by grouping and plotting as follows:

df.groupby([lambda t: t.date(), lambda t: t.hour]).srcIP.nunique().plot()

However, I want to know how that overall traffic is split amongst my servers. My intuition was to additionally group by the 'dstIP' column (which only has 5 unique values), but I get errors when I try to aggregate on srcIP.

grouped = df.groupby([lambda t: t.date(), lambda t: t.hour, 'dstIP'])
grouped.sip.nunique()
...
Exception: Reindexing only valid with uniquely valued Index objects

So, my specific question is: How can I avoid this exception in order to create a plot where traffic is aggregated over 1 hour blocks and there is a different series for each server.

More generally, please let me know what newb errors I'm making. Also, the data does not have regular frequency timestamps and I don't want sampled data in case that makes any difference in your answer.

EDIT 1 This is my ipython session exactly as input. output ommitted except for the deepest few calls in the error.

EDIT 2 Upgrading pandas from 0.8.0 to 0.12.0 as yielded a more descriptive exception shown below

import numpy as np
import pandas as pd
import time
import datetime

full_set = pd.read_csv('june.csv', parse_dates=True, index_col=0)
full_set.sort_index(inplace=True)
gp = full_set.groupby(lambda t: (t.date(), t.hour, full_set['dip'][t]))
gp['sip'].nunique()
... 
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _make_labels(self)
   1239             raise Exception('Should not call this method grouping by level')
   1240         else:
-> 1241             labs, uniques = algos.factorize(self.grouper, sort=self.sort)
   1242             uniques = Index(uniques, name=self.name)
   1243             self._labels = labs

/usr/local/lib/python2.7/dist-packages/pandas/core/algorithms.pyc in factorize(values, sort, order, na_sentinel)
    123     table = hash_klass(len(vals))
    124     uniques = vec_klass()
--> 125     labels = table.get_labels(vals, uniques, 0, na_sentinel)
    126 
    127     labels = com._ensure_platform_int(labels)

/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_labels (pandas/hashtable.c:12229)()

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in __hash__(self)
     52     def __hash__(self):
     53         raise TypeError('{0!r} objects are mutable, thus they cannot be'
---> 54                               ' hashed'.format(self.__class__.__name__))
     55 
     56     def __unicode__(self):

TypeError: 'TimeSeries' objects are mutable, thus they cannot be hashed

maxp

I ended up solving my problem by adding a new column of hour-truncated datetimes to the original dataframe as follows:

f = lambda i: i.strftime('%Y-%m-%d %H:00:00')
full_set['hours'] = full_set.index.map(f)

Then I can groupby('dip') and loop through each destIP creating an hourly grouped plot as I go...

for d, g in dipgroup:
    g.groupby('hours').sip.nunique().plot()

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-3

Comments

0 comments

From Dev

Related Related

Article

Exception during groupby pandas

Exception during groupby pandas

Retaining a column with all strings during groupby on a pandas dataframe

Exception during wordcount in Hadoop

Unknown Exception During Build

Exception during context initialization -

pandas apply groupby on a groupby object

During handling of the above exception, another exception occurred

During handling of the above exception, another exception occurred

Unexpected exception handling during compilation

Handle exception in pyserial during disconnection

NullReference exception during binding with MvvmCross

RaiseCanExecuteChanged COM Exception during Navigation?

Exception during Netty server shutdown

runtime exception during nutch generate

AutoCompleteTextView - An exception occured during performFiltering()!

EXCEPTION: Error during evaluation of "click"

Exception during Collections.sort()

AutoCompleteTextView - An exception occured during performFiltering()!

Runtime Exception during executing a function

Appending columns during groupby-apply operations

pandas groupby to nested json

Seaborn groupby pandas Series

pandas dataframe groupby summation

Python - Pandas groupby agg

bootstrap on a groupby object in pandas

Pandas Groupby & Pivot

Pivot Table with Groupby - Pandas

GroupBy and plot with pandas

Pandas groupby + ngroup in pipes

Groupby or pivot in pandas?