次のDataFrameがあるとします。
+-------+-----+-------+-----+--------+---------------------------+
| DID | CID | Event | OID | Source | TimeStamp |
+-------+-----+-------+-----+--------+---------------------------+
| 25078 | 14 | QBT | 0 | EMS | 2019-10-15 10:54:35 +0000 |
| 25078 | 14 | NDOBT | 0 | EMS | 2019-10-15 10:54:48 +0000 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15 10:54:52 +0000 |
| 25078 | 14 | SBT-1 | 0 | ECS | 2019-10-15 11:00:01 +0000 |
| 25078 | 14 | SBT-1 | 0 | ECS | 2019-10-15 11:00:26 +0000 |
| 25078 | 14 | SBT-1 | 0 | ECS | 2019-10-15 11:00:50 +0000 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:27:45 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:27:45 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:28:16 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:28:16 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T14:28:18 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T14:28:18 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:28:44 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:28:44 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:30:55 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:30:55 |
| 25078 | 14 | SBT | 0 | EMS-1 | 2019-10-15T15:28:43 |
| 25078 | 14 | SBT | 0 | EMS-1 | 2019-10-15T15:29:02 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T15:30:51 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T15:30:51 |
| 25078 | 14 | DBT | 0 | RS | 2019-10-15T15:44:23 |
| 25078 | 14 | QBT | 0 | EMS-1 | 2019-10-15T16:02:16 |
+-------+-----+-------+-----+--------+---------------------------+
最終的な出力が次のようになるように、いくつかのイベントとソースの最初と最後の発生を取得したいと思います。
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
| DID | CID | Event-QBT-Last-DT | Event-QBT-First-DT | Event-SBT-Last-DT | Event-SBT-First-DT | Screen-ECS-First-DT | Screen-ECS-Last-DT | FirstTimeUsage | LastTime Usage |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
| 25078 | 14 | 2019-10-15T16:02:16 | 10/15/19 10:54 AM | 2019-10-15T15:29:02 | 10/15/19 10:54 AM | 2019-10-15 11:00:01 +0000 | 2019-10-15 11:00:50 +0000 | 2019-10-15 10:54:35 +0000 | 2019-10-15T16:02:16 |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
パンダを使用してこれを実現するにはどうすればよいですか。
アイデアはで行をフィルタリングしているboolean indexing
とDataFrame.isin
、いくつかのイベントの取得最初と最後の出現のために、次に使用GroupBy.agg
してfirst
とlast
してリシェイプでDataFrame.unstack
最後のフラット化、MultiIndex
列に:
L = ['QBT','SBT']
df1 = (df[df['Event'].isin(L)]
.groupby(['OID','DID','CID','Event'])['TimeStamp']
.agg([('Last-DT','last'), ('First-DT','first')])
.unstack()
.sort_index(axis=1, level=1))
df1.columns = [f'Event-{b}-{a}' for a, b in df1.columns]
#print (df1)
最初と最後の発生には、フィルターなしEvent
、groupbyなし、およびなしの最初のソリューションを使用しますunstack
。
df2 = (df.groupby(['OID','DID','CID'])['TimeStamp']
.agg([('FirstTimeUsage','first'), ('LastTime Usage','last')]))
#print (df2)
最後にDataFrame.join
一緒に参加する:
df = df1.join(df2).reset_index()
print (df)
OID DID CID Event-QBT-First-DT Event-QBT-Last-DT \
0 0 25078 14 2019-10-15 10:54:35 +0000 2019-10-15T16:02:16
Event-SBT-First-DT Event-SBT-Last-DT FirstTimeUsage \
0 2019-10-15 10:54:52 +0000 2019-10-15T15:29:02 2019-10-15 10:54:35 +0000
LastTime Usage
0 2019-10-15T16:02:16
編集:次の列を処理するために、ビット変更生成df1
:
L = ['QBT','SBT']
df1 = (df[df['Event'].isin(L)]
.groupby(['OID','DID','CID','Event'])['TimeStamp']
.agg([('Last-DT','last'), ('First-DT','first')])
.unstack()
.sort_index(axis=1, level=1))
df1.columns = [f'Event-{b}-{a}' for a, b in df1.columns]
#print (df1)
L2 = ['ECS']
df11 = (df[df['Source'].isin(L2)]
.groupby(['OID','DID','CID','Source'])['TimeStamp']
.agg([('Last-DT','last'), ('First-DT','first')])
.unstack()
.sort_index(axis=1, level=1))
df11.columns = [f'Screen-{b}-{a}' for a, b in df11.columns]
df2 = (df.groupby(['OID','DID','CID'])['TimeStamp']
.agg([('FirstTimeUsage','first'), ('LastTime Usage','last')]))
最後の使用concat
:
df = pd.concat([df1, df11, df2], axis=1).reset_index()
print (df)
OID DID CID Event-QBT-First-DT Event-QBT-Last-DT \
0 0 25078 14 2019-10-15 10:54:35 +0000 2019-10-15T16:02:16
Event-SBT-First-DT Event-SBT-Last-DT Screen-ECS-First-DT \
0 2019-10-15 10:54:52 +0000 2019-10-15T15:29:02 2019-10-15 11:00:01 +0000
Screen-ECS-Last-DT FirstTimeUsage LastTime Usage
0 2019-10-15 11:00:50 +0000 2019-10-15 10:54:35 +0000 2019-10-15T16:02:16
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加