インデックスに基づいて他のデータフレームと一致するようにデータフレームを拡張する方法

debugcn 投稿 Dev

ジオゴシルバ

A学生情報を含む次のデータフレームがあります。

student_id  signup_year age
1           2010        18
2           2011        19
3           2015        25

そして、B学生の学業成績を含む次のデータフレーム：

student_id  discipline    grade  finishing_date
1           math          18     5/3/2011
1           science       15     5/3/2011
2           math          14     10/4/2013
2           science       13     10/4/2013
3           math          12     11/5/2016
3           science       11     12/6/2016

表Bでは、次の条件で、1年生の生徒の成績を計算したいと思います。

grade = 0 if finishing_year - signup_year > 1 else grade

出力（テーブルB）は次のようになります。

student_id  discipline    grade  finishing_date
1           math          18     5/3/2011
1           science       15     5/3/2011
2           math          0      10/4/2013
2           science       0      10/4/2013
3           math          12     11/5/2016
3           science       11     12/6/2016

問題は、この操作をベクトル化することです（私のデータセットには+500 000サンプルが含まれています）

私が試したこと：

def vectorized(A, B):

    B["grade"] = np.where(
        pd.DatetimeIndex(B["finishing_date"]).year - A["signup_year"]
        > 1,
        B["grade"] * 0,
        B["grade"],
    )
    return grades_df

ただし、A["signup_year"]と同じ長さがないため、これは機能しませんB["finishing_date"]).year。どうすればこれにアプローチできますか？

ジェズリール

使用Series.mapGETのためSeriesのように同じ長さを持つBことによりstudent_id：

B["grade"] = np.where(
       pd.to_datetime(B["finishing_date"]).dt.year - 
       B["student_id"].map(A.set_index('student_id')['signup_year'])
       > 1,
       0,
       B["grade"])

print (B)
   student_id discipline  grade finishing_date
0           1       math     18       5/3/2011
1           1    science     15       5/3/2011
2           2       math      0      10/4/2013
3           2    science      0      10/4/2013
4           3       math     12      11/5/2016
5           3    science     11      12/6/2016

詳細：

print (B["student_id"].map(A.set_index('student_id')['signup_year']))
0    2010
1    2010
2    2011
3    2011
4    2015
5    2015
Name: student_id, dtype: int64

もう1つのアイデアはmerge、左結合での使用です。

B["grade"] = np.where(
       pd.to_datetime(B["finishing_date"]).dt.year - 
       B.merge(A, on="student_id", how='left')['signup_year']
       > 1,
       0,
       B["grade"])

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]