Exception during groupby pandas

maxp

I am just beginning to learn analytics with python for network analysis using the Python For Data Analysis book and I'm getting confused by an exception I get while doing some groupby's... here's my situation.

I have a CSV of NetFlow data that I've imported to pandas. The data looks something like:

dt, srcIP, srcPort, dstIP, dstPort, bytes
2013-06-06 00:00:01.123, 123.123.1.1, 12345, 234.234.1.1, 80, 75

I've imported and indexed the data as follows:

df = pd.read_csv('mycsv.csv')
df.index = pd.to_datetime(full_set.pop('dt'))

What I want is a count of unique srcIPs which visit my servers per time period (I have data over several days and I'd like time period by date,hour). I can obtain an overall traffic graph by grouping and plotting as follows:

df.groupby([lambda t: t.date(), lambda t: t.hour]).srcIP.nunique().plot()

However, I want to know how that overall traffic is split amongst my servers. My intuition was to additionally group by the 'dstIP' column (which only has 5 unique values), but I get errors when I try to aggregate on srcIP.

grouped = df.groupby([lambda t: t.date(), lambda t: t.hour, 'dstIP'])
grouped.sip.nunique()
...
Exception: Reindexing only valid with uniquely valued Index objects

So, my specific question is: How can I avoid this exception in order to create a plot where traffic is aggregated over 1 hour blocks and there is a different series for each server.

More generally, please let me know what newb errors I'm making. Also, the data does not have regular frequency timestamps and I don't want sampled data in case that makes any difference in your answer.

EDIT 1 This is my ipython session exactly as input. output ommitted except for the deepest few calls in the error.

EDIT 2 Upgrading pandas from 0.8.0 to 0.12.0 as yielded a more descriptive exception shown below

import numpy as np
import pandas as pd
import time
import datetime

full_set = pd.read_csv('june.csv', parse_dates=True, index_col=0)
full_set.sort_index(inplace=True)
gp = full_set.groupby(lambda t: (t.date(), t.hour, full_set['dip'][t]))
gp['sip'].nunique()
... 
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _make_labels(self)
   1239             raise Exception('Should not call this method grouping by level')
   1240         else:
-> 1241             labs, uniques = algos.factorize(self.grouper, sort=self.sort)
   1242             uniques = Index(uniques, name=self.name)
   1243             self._labels = labs

/usr/local/lib/python2.7/dist-packages/pandas/core/algorithms.pyc in factorize(values, sort, order, na_sentinel)
    123     table = hash_klass(len(vals))
    124     uniques = vec_klass()
--> 125     labels = table.get_labels(vals, uniques, 0, na_sentinel)
    126 
    127     labels = com._ensure_platform_int(labels)

/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_labels (pandas/hashtable.c:12229)()

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in __hash__(self)
     52     def __hash__(self):
     53         raise TypeError('{0!r} objects are mutable, thus they cannot be'
---> 54                               ' hashed'.format(self.__class__.__name__))
     55 
     56     def __unicode__(self):

TypeError: 'TimeSeries' objects are mutable, thus they cannot be hashed
maxp

I ended up solving my problem by adding a new column of hour-truncated datetimes to the original dataframe as follows:

f = lambda i: i.strftime('%Y-%m-%d %H:00:00')
full_set['hours'] = full_set.index.map(f)

Then I can groupby('dip') and loop through each destIP creating an hourly grouped plot as I go...

for d, g in dipgroup:
    g.groupby('hours').sip.nunique().plot()

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Retaining a column with all strings during groupby on a pandas dataframe

From Dev

Exception during wordcount in Hadoop

From Dev

Unknown Exception During Build

From Dev

Exception during context initialization -

From Dev

pandas apply groupby on a groupby object

From Java

During handling of the above exception, another exception occurred

From Dev

During handling of the above exception, another exception occurred

From Dev

Unexpected exception handling during compilation

From Dev

Handle exception in pyserial during disconnection

From Dev

NullReference exception during binding with MvvmCross

From Dev

RaiseCanExecuteChanged COM Exception during Navigation?

From Dev

Exception during Netty server shutdown

From Dev

runtime exception during nutch generate

From Dev

AutoCompleteTextView - An exception occured during performFiltering()!

From Dev

EXCEPTION: Error during evaluation of "click"

From Dev

Exception during Collections.sort()

From Dev

AutoCompleteTextView - An exception occured during performFiltering()!

From Dev

Runtime Exception during executing a function

From Dev

Appending columns during groupby-apply operations

From Dev

pandas groupby to nested json

From Dev

Seaborn groupby pandas Series

From Dev

pandas dataframe groupby summation

From Dev

Python - Pandas groupby agg

From Dev

bootstrap on a groupby object in pandas

From Java

Pandas Groupby & Pivot

From Java

Pivot Table with Groupby - Pandas

From Java

GroupBy and plot with pandas

From Java

Pandas groupby + ngroup in pipes

From Dev

Groupby or pivot in pandas?