列にサイクルがある場合は、データフレームの行を複数の行に変換します

モニカの革命

私が持っているデータフレーム、私はいくつかの行にいくつかの行の一部を変換したいです。実際、行はQuestions列の質問を表し、列のこれらの質問への回答を表しますAnswer_iたとえば、次の行:

    QID     Questions   QType   Answer_1    Answer_2    Answer_3    Answer_4    Answer_5    Answer_6    Answer_7    Answer_8    Answer_9    Answer_10   Answer_11   Answer_12   Answer_13   Answer_14   Answer_15
1177    The travel restrictions of COVID-19 have been ...   Likert Scale    Very important consideration    Important consideration     Somewhat consider   Not an important consideration  Do not consider     Discounted flights  Very important consideration    Important consideration     Somewhat consider   Not an important consideration  Do not consider     Baggage policy  Very important consideration    Important consideration     Somewhat consider Not an important considera...     Do not consider

この行について、次のデータフレームを取得したいと思います。

    QID    Questions    QType    Answer_1     Answer_2    Answer_3    Answer_4 ...    
1263    1177    The travel restrictions of COVID-19 have been lifted and you are looking to book a flight. To what extent are the following factors considerations in your choice of flight?    Likert Scale 
  Very important consideration  Important consideration Somewhat consider   Not an important consideration  Do not consider 
1264    1177_1  Discounted flights  Likert Scale    Very important consideration    Important consideration Somewhat consider   Not an important consideration  Do not consider 
1265    1177_2  Baggage policy    Likert Scale  Very important consideration    Important consideration Somewhat consider   Not an important consideration  Do not consider

今まで私は答えを繰り返してみました:

for i, row in df.iterrows():
    passed_items = []
    for cell in row:
        if cell in passed_items:
            print("need to create a new line")
            answers = {f"Answer{i}": passed_items[i] for i in range(0, len(passed_items))} # dyanmically allocate to place them in the right columns
            dict_replacing = {'Questions': questions, **answers} # dictionary that will replace the forle create the new lines
            df1 = pd.DataFrame(dict_replacing)
            df = df1.combine_first(df)
            passed_items = []
        passed_items.append(str(cell))

しかし、それは私に戻ってきます:

    Answer_2    Answer_9    Answer_0    Answer_1    Answer_10   Answer_11   Answer_12   Answer_13   Answer_14   Answer_2    Answer_3    Answer_4    Answer_5    Answer_6    Answer_7    Answer_8    QID     QType   Questions
0   NaN     NaN     Very important consideration    Important consideration     NaN     NaN     NaN     NaN     NaN     Somewhat consider   Not an important consideration  Do not consider     Baggage policy  Discounted flights  NaN     NaN     NaN     NaN     The airline/company you fly with
1   NaN     NaN     Very important consideration    Important consideration     NaN     NaN     NaN     NaN     NaN     Somewhat consider   Not an important consideration  Do not consider     Baggage policy  Discounted flights  NaN     NaN     NaN     NaN     The departure airport
2   NaN     NaN     Very important consideration    Important consideration     NaN     NaN     NaN     NaN     NaN     Somewhat consider   Not an important consideration  Do not consider     Baggage policy  Discounted flights  NaN     NaN     NaN     NaN     Duration of flight/route
3   NaN     NaN     Very important consideration    Important consideration     NaN     NaN     NaN     NaN     NaN     Somewhat consider   Not an important consideration  Do not consider     Baggage policy  Discounted flights  NaN     NaN     NaN     NaN     Price
4   NaN     NaN     Very important consideration    Important consideration     NaN     NaN     NaN     NaN     NaN     Somewhat consider   Not an important consideration  Do not consider     Baggage policy  Discounted flights  NaN     NaN     NaN     NaN     Baggage policy
5   NaN     NaN     Very important consideration    Important consideration     NaN     NaN     NaN     NaN     NaN     Somewhat consider   Not an important consideration  Do not consider     Baggage policy  Discounted flights  NaN     NaN     NaN     NaN     Environmental impacts
1263    Important consideration Somewhat consider No...     Do not consider     NaN     Very important consideration    Baggage policy  Very important consideration    Important consideration     Somewhat consider Not an important considera...     Do not consider     NaN     Do not consider     Discounted flights  Very important consideration    Important consideration     Somewhat consider   Not an important consideration  1177.0  Likert Scale    The travel restrictions of COVID-19 have been ...

列の順序は尊重されず、一部は二重になっています。

更新

私はロブ・レイモンドの答えを理解しようとしてきました。

わからない:

  • forループ:for i in range(3, len(r)-len(repeat)):データフレームの最後の列まで各列を繰り返しますか?
  • 爆発機能df.apply(lambda r: getquestions(r), axis=1).explode("Questions")

w3ressourceによると

explode()関数は、リストの各要素を行に変換し、インデックス値を複製するために使用されます。

rを私が探していたリストに変換するのはこのことですか?

  • 新しいデータグラムを作成する方法。私はそれが前の答えに関連していると思います。

これが私のコメント付きのコードです:

import collections
df = pd.DataFrame({"QID":[1177],"Questions":["The travel restrictions of COVID-19 have been lifted and you are looking to book a flight. To what extent are the following factors considerations in your choice of flight?"],"QType":["Likert Scale"],"Answer0":["Very important consideration"],"Answer1":["Important consideration"],"Answer2":["Somewhat consider"],"Answer3":["Not an important consideration"],"Answer4":["Do not consider"],"Answer5":["Discounted flights"],"Answer6":["Very important consideration"],"Answer7":["Important consideration"],"Answer8":["Somewhat consider"],"Answer9":["Not an important consideration"],"Answer10":["Do not consider"],"Answer11":["Baggage policy"],"Answer12":["Very important consideration"],"Answer13":["Important consideration"],"Answer14":["Somewhat consider"],"Answer15":["Not an important consideration"],"Answer16":["Do not consider"],"Answer17":["Price of flights"],"Answer18":["Very important consideration"],"Answer19":["Important consideration"],"Answer20":["Somewhat consider"],"Answer21":["Not an important consideration"],"Answer22":["Do not consider"],"Answer23":["Insurance"],"Answer24":["Very important consideration"],"Answer25":["Important consideration"],"Answer26":["Somewhat consider"],"Answer27":["Not an important consideration"],"Answer28":["Do not consider"],"Answer29":["Airport services"],"Answer30":["Very important consideration"],"Answer31":["Important consideration"],"Answer32":["Somewhat consider"],"Answer33":["Not an important consideration"],"Answer34":["Do not consider"],"Answer35":["Environmental impact"],"Answer36":["Very important consideration"],"Answer37":["Important consideration"],"Answer38":["Somewhat consider"],"Answer39":["Not an important consideration"],"Answer40":["Do not consider"],"Answer41":["In-flight service"],"Answer42":["Very important consideration"],"Answer43":["Important consideration"],"Answer44":["Somewhat consider"],"Answer45":["Not an important consideration"],"Answer46":["Do not consider"],"Answer47":["Customer support"],"Answer48":["Very important consideration"],"Answer49":["Important consideration"],"Answer50":["Somewhat consider"],"Answer51":["Not an important consideration"],"Answer52":["Do not consider"],"Answer53":["Overcrowding on aircraft/airports"],"Answer54":["Very important consideration"],"Answer55":["Important consideration"],"Answer56":["Somewhat consider"],"Answer57":["Not an important consideration"],"Answer58":["Do not consider"],"Answer59":["Airport safety after COVID-19"],"Answer60":["Very important consideration"],"Answer61":["Important consideration"],"Answer62":["Somewhat consider"],"Answer63":["Not an important consideration"],"Answer64":["Do not consider"],"Answer65":["Refund policy"]})
def getquestions(r):
    # counter
    repeat = list({k:v for k,v in collections.Counter(r[3:].values).items() if v>1}) # get all the questions
    questions = []
    firstfound = 0
    # 
    for i in range(3, len(r)-len(repeat)): # I don't get this one
        if r[i:i+len(repeat)].tolist()==repeat: # I think we are trying to get the subset that are repeat
            if r[i+len(repeat):i+len(repeat)+1].values[0] is not None: # here we get the question
                questions.append(r[i+len(repeat):i+len(repeat)+1].values[0]) # we store it
            if firstfound==0: firstfound = i+len(repeat) # so when it's not 0, we do not update? Why? Why is this thing for?
    if len(questions) > 0: #weird cases ?
        # somethong odd, sometimes it's a list other times a str
        newq = r[1] + questions if isinstance(r[1], list) else [r[1]] + questions
        r[1] = newq
        # reset all the questions that have been used by list
        for i in range(firstfound, len(r)):
            if isinstance(r[i], str): r[i] = None
    return r
def fixqid(c):
    return [id if i==0 or c[i-1]!=id else f"{id}_{i}" for i, id in enumerate(c)]

df = df.apply(lambda r: getquestions(r), axis=1).explode("Questions").reset_index().drop("index", 1) # what does explode stands for?
df["QID"] = fixqid(df["QID"].values)   
df
ロブレイモンド

洗練された変革

  1. コアのビルドで質問その後、埋め込まれた質問のリストとしてexplode()
  2. 使用しているcollections.Counter()にキー一意の値に回答#1列1回だけ発生することがものを排除
  3. このリストを使用して、列間でリストを移動し、一致する位置を見つけます。次の列を埋め込み質問として受け取り、リストに追加します
  4. 質問を埋め込んだら、リストを[質問]列に戻します。冗長なすべての回答をリセットする
  5. QIDを修正するための後処理

これにより、この例では実際に10個の埋め込まれた質問が見つかります。賢明にフォーマットするには幅が広いので、出力が含まれていることに注意しました

import collections
df = pd.DataFrame({"QID":[1177,"1177R"],"Questions":["The travel restrictions of COVID-19 have been lifted and you are looking to book a flight. To what extent are the following factors considerations in your choice of flight?","How would you like to book your next holiday?"],"QType":["Likert Scale","Likert Scale"],"Answer0":["Very important consideration","Airline XYZ app"],"Answer1":["Important consideration","Airline XYZ website"],"Answer2":["Somewhat consider","Third party website"],"Answer3":["Not an important consideration","Third party app"],"Answer4":["Do not consider","Travel agent"],"Answer5":["Discounted flights","Call"],"Answer6":["Very important consideration",""],"Answer7":["Important consideration",""],"Answer8":["Somewhat consider",""],"Answer9":["Not an important consideration",""],"Answer10":["Do not consider",""],"Answer11":["Baggage policy",""],"Answer12":["Very important consideration",""],"Answer13":["Important consideration",""],"Answer14":["Somewhat consider",""],"Answer15":["Not an important consideration",""],"Answer16":["Do not consider",""],"Answer17":["Price of flights",""],"Answer18":["Very important consideration",""],"Answer19":["Important consideration",""],"Answer20":["Somewhat consider",""],"Answer21":["Not an important consideration",""],"Answer22":["Do not consider",""],"Answer23":["Insurance",""],"Answer24":["Very important consideration",""],"Answer25":["Important consideration",""],"Answer26":["Somewhat consider",""],"Answer27":["Not an important consideration",""],"Answer28":["Do not consider",""],"Answer29":["Airport services",""],"Answer30":["Very important consideration",""],"Answer31":["Important consideration",""],"Answer32":["Somewhat consider",""],"Answer33":["Not an important consideration",""],"Answer34":["Do not consider",""],"Answer35":["Environmental impact",""],"Answer36":["Very important consideration",""],"Answer37":["Important consideration",""],"Answer38":["Somewhat consider",""],"Answer39":["Not an important consideration",""],"Answer40":["Do not consider",""],"Answer41":["In-flight service",""],"Answer42":["Very important consideration",""],"Answer43":["Important consideration",""],"Answer44":["Somewhat consider",""],"Answer45":["Not an important consideration",""],"Answer46":["Do not consider",""],"Answer47":["Customer support",""],"Answer48":["Very important consideration",""],"Answer49":["Important consideration",""],"Answer50":["Somewhat consider",""],"Answer51":["Not an important consideration",""],"Answer52":["Do not consider",""],"Answer53":["Overcrowding on aircraft/airports",""],"Answer54":["Very important consideration",""],"Answer55":["Important consideration",""],"Answer56":["Somewhat consider",""],"Answer57":["Not an important consideration",""],"Answer58":["Do not consider",""],"Answer59":["Airport safety after COVID-19",""],"Answer60":["Very important consideration",""],"Answer61":["Important consideration",""],"Answer62":["Somewhat consider",""],"Answer63":["Not an important consideration",""],"Answer64":["Do not consider",""],"Answer65":["Refund policy",""]})
def getquestions(r):
    repeat = list({k:v for k,v in collections.Counter(r[3:].values).items() if v>1 and isinstance(k, str)})
    if len(repeat)<3: return r
    questions = []
    firstfound = 0
    for i in range(3, len(r)-len(repeat)):
        if r[i:i+len(repeat)].tolist()==repeat:
            if r[i+len(repeat):i+len(repeat)+1].values[0] is not None:
                questions.append(r[i+len(repeat):i+len(repeat)+1].values[0])
            if firstfound==0: firstfound = i+len(repeat)
    if len(questions) > 0:
        # somethong odd, sometimes it's a list other times a str
        newq = r[1] + questions if isinstance(r[1], list) else [r[1]] + questions
        r[1] = newq
        # reset all the questions that have been used by list
        for i in range(firstfound, len(r)):
            if isinstance(r[i], str): r[i] = np.nan
    return r
def fixqid(c):
    qid = []
    sub = 0
    for i, id in enumerate(c):
        if i==0 or c[i-1]!=id:
            sub=0
            qid.append(id)
        else:
            sub +=1
            qid.append(f"{id}_{sub}")
    return qid 

df = df.apply(lambda r: getquestions(r), axis=1).explode("Questions").reset_index().drop("index", 1)
df["QID"] = fixqid(df["QID"].values)   
df.iloc[:,:10]

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

列に複数の値がある可能性がある場合は、データフレーム列をintdtypeに変換します

分類Dev

列のデータフレームに複数の値がある場合は、複数の行を作成します

分類Dev

任意の行の値が別のデータフレームである場合にデータフレームは複数の列を有する、データフレームから行を削除します

分類Dev

Pythonで2つのデータフレームを左結合する方法、フィルター後の2番目のデータフレームに一致する行が複数ある場合は、最初の行と結合します

分類Dev

列名が一致する場合は、データフレームの列値を行に変更します

分類Dev

Python:データフレームのいずれかの列に対して行に同じ値が複数あるかどうかを確認し、ある場合は、繰り返される値をnullに置き換えます

分類Dev

複数の値に一致する場合は、Pandasデータフレームの行を削除します

分類Dev

Pandasデータフレームの特定の列にnull値がある場合、行を削除します

分類Dev

フィールドがディクショナリにある場合は、データフレームの行をマージします

分類Dev

Sparkデータフレームは複数の行を列に変換します

分類Dev

R内の複数のファイルを行ごとに結合します。各ファイルはデータフレームの列になります

分類Dev

pandasデータフレームで列を複数の行に変換する

分類Dev

複数行の文字列をデータフレームに変換する

分類Dev

別の列の値が行ごとに変わる場合にtrueを返すRデータフレームにブール列を作成します

分類Dev

日付がx軸上にあり、状態の場合に、データフレームで最初に出現する変数の行を抽出します

分類Dev

Excelは、行に複数のデータがある場合にのみ自動入力を実行します

分類Dev

ループ内の複数のデータフレームの最初の行を列名に変換します

分類Dev

pandas-データフレームのrow [i]がrow [j]にある場合は、行を選択します

分類Dev

列の値が一連の値のリストにある場合、データフレーム行をフィルタリングします

分類Dev

列の値が一連の値のリストにある場合、データフレーム行をフィルタリングします

分類Dev

データフレームを使用してCSV列の値を複数の行に変換する

分類Dev

両方のデータフレームにまったく同じ列とインデックスがある場合、別のデータフレームの条件を照合して、あるデータフレームのデータをグループ化するにはどうすればよいですか?

分類Dev

特定の列の値が両方のデータフレームで一致する場合は、あるデータフレームの行を別のデータフレームにコピーします

分類Dev

複数の一致がある場合は、データフレームを結合し、ランダムな行を選択します

分類Dev

Python:パンダのデータフレームを変換して、インデックスと列のIDが行の要素になるようにします

分類Dev

Pythonデータフレームで複数の列を行に変換/ピボット解除します

分類Dev

別の共通変数の値が等しい場合にのみ、あるデータフレームの値を別のデータフレームの値に置き換えるR関数はありますか?

分類Dev

キーがタプルで値が整数であるパンダデータフレームの行に辞書アイテムを変換します

分類Dev

マルチインデックスを使用してパンダデータフレームの単一行を複数行に合計する方法は?

Related 関連記事

  1. 1

    列に複数の値がある可能性がある場合は、データフレーム列をintdtypeに変換します

  2. 2

    列のデータフレームに複数の値がある場合は、複数の行を作成します

  3. 3

    任意の行の値が別のデータフレームである場合にデータフレームは複数の列を有する、データフレームから行を削除します

  4. 4

    Pythonで2つのデータフレームを左結合する方法、フィルター後の2番目のデータフレームに一致する行が複数ある場合は、最初の行と結合します

  5. 5

    列名が一致する場合は、データフレームの列値を行に変更します

  6. 6

    Python:データフレームのいずれかの列に対して行に同じ値が複数あるかどうかを確認し、ある場合は、繰り返される値をnullに置き換えます

  7. 7

    複数の値に一致する場合は、Pandasデータフレームの行を削除します

  8. 8

    Pandasデータフレームの特定の列にnull値がある場合、行を削除します

  9. 9

    フィールドがディクショナリにある場合は、データフレームの行をマージします

  10. 10

    Sparkデータフレームは複数の行を列に変換します

  11. 11

    R内の複数のファイルを行ごとに結合します。各ファイルはデータフレームの列になります

  12. 12

    pandasデータフレームで列を複数の行に変換する

  13. 13

    複数行の文字列をデータフレームに変換する

  14. 14

    別の列の値が行ごとに変わる場合にtrueを返すRデータフレームにブール列を作成します

  15. 15

    日付がx軸上にあり、状態の場合に、データフレームで最初に出現する変数の行を抽出します

  16. 16

    Excelは、行に複数のデータがある場合にのみ自動入力を実行します

  17. 17

    ループ内の複数のデータフレームの最初の行を列名に変換します

  18. 18

    pandas-データフレームのrow [i]がrow [j]にある場合は、行を選択します

  19. 19

    列の値が一連の値のリストにある場合、データフレーム行をフィルタリングします

  20. 20

    列の値が一連の値のリストにある場合、データフレーム行をフィルタリングします

  21. 21

    データフレームを使用してCSV列の値を複数の行に変換する

  22. 22

    両方のデータフレームにまったく同じ列とインデックスがある場合、別のデータフレームの条件を照合して、あるデータフレームのデータをグループ化するにはどうすればよいですか?

  23. 23

    特定の列の値が両方のデータフレームで一致する場合は、あるデータフレームの行を別のデータフレームにコピーします

  24. 24

    複数の一致がある場合は、データフレームを結合し、ランダムな行を選択します

  25. 25

    Python:パンダのデータフレームを変換して、インデックスと列のIDが行の要素になるようにします

  26. 26

    Pythonデータフレームで複数の列を行に変換/ピボット解除します

  27. 27

    別の共通変数の値が等しい場合にのみ、あるデータフレームの値を別のデータフレームの値に置き換えるR関数はありますか?

  28. 28

    キーがタプルで値が整数であるパンダデータフレームの行に辞書アイテムを変換します

  29. 29

    マルチインデックスを使用してパンダデータフレームの単一行を複数行に合計する方法は?

ホットタグ

アーカイブ