時系列データの欠落している行をPythonのパンダデータフレームに追加する方法

debugcn 投稿 Dev

user3104352

以下に示す製品の時系列データセットがあります。

date    product price   amount
11/17/2019  A   10  20
11/19/2019  A   15  20
11/24/2019  A   20  30
12/01/2019  C   40  50
12/05/2019  C   45  35

このデータには、各製品のデータの開始日と終了日の間に欠落している日（ "MM / dd / YYYY"）があります。欠落している日付をゼロ行で埋めて、前のテーブルを以下のテーブルに変換しようとしています。

date    product price   amount
11/17/2019  A   10  20
11/18/2019  A   0   0
11/19/2019  A   15  20
11/20/2019  A   0   0
11/21/2019  A   0   0
11/22/2019  A   0   0
11/23/2019  A   0   0
11/24/2019  A   20  30
12/01/2019  C   40  50
12/02/2019  C   0   0
12/03/2019  C   0   0
12/04/2019  C   0   0
12/05/2019  C   45  35

この変換を取得するために、私はコードを使用しました：

import pandas as pd
import numpy as np
data=pd.read_csv("test.txt", sep="\t", parse_dates=['date'])
data=data.set_index(["date", "product"])
start=data.first_valid_index()[0]
end=data.last_valid_index()[0]
df=data.set_index("date").reindex(pd.date_range(start,end, freq="1D"), fill_values=0)

ただし、コードはエラーを出します。この変換を効率的に行う方法はありますか？

ジェズリール

0欠落している日時をそれぞれproduct個別に追加する必要がある場合GroupBy.applyはDataFrame.reindex、最小および最大の日時でカスタム関数を使用します。

df = pd.read_csv("test.txt", sep="\t", parse_dates=['date'])

f = lambda x: x.reindex(pd.date_range(x.index.min(), 
                                      x.index.max(), name='date'), fill_value=0)
df = (df.set_index('date')
        .groupby('product')
        .apply(f)
        .drop('product', axis=1)
        .reset_index())
print (df)
   product       date  price  amount
0        A 2019-11-17     10      20
1        A 2019-11-18      0       0
2        A 2019-11-19     15      20
3        A 2019-11-20      0       0
4        A 2019-11-21      0       0
5        A 2019-11-22      0       0
6        A 2019-11-23      0       0
7        A 2019-11-24     20      30
8        C 2019-12-01     40      50
9        C 2019-12-02      0       0
10       C 2019-12-03      0       0
11       C 2019-12-04      0       0
12       C 2019-12-05     45      35

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]