私が持っているデータフレーム、私はいくつかの行にいくつかの行の一部を変換したいです。実際、行はQuestions
列の質問を表し、列のこれらの質問への回答を表しますAnswer_i
。たとえば、次の行:
QID Questions QType Answer_1 Answer_2 Answer_3 Answer_4 Answer_5 Answer_6 Answer_7 Answer_8 Answer_9 Answer_10 Answer_11 Answer_12 Answer_13 Answer_14 Answer_15
1177 The travel restrictions of COVID-19 have been ... Likert Scale Very important consideration Important consideration Somewhat consider Not an important consideration Do not consider Discounted flights Very important consideration Important consideration Somewhat consider Not an important consideration Do not consider Baggage policy Very important consideration Important consideration Somewhat consider Not an important considera... Do not consider
この行について、次のデータフレームを取得したいと思います。
QID Questions QType Answer_1 Answer_2 Answer_3 Answer_4 ...
1263 1177 The travel restrictions of COVID-19 have been lifted and you are looking to book a flight. To what extent are the following factors considerations in your choice of flight? Likert Scale
Very important consideration Important consideration Somewhat consider Not an important consideration Do not consider
1264 1177_1 Discounted flights Likert Scale Very important consideration Important consideration Somewhat consider Not an important consideration Do not consider
1265 1177_2 Baggage policy Likert Scale Very important consideration Important consideration Somewhat consider Not an important consideration Do not consider
今まで私は答えを繰り返してみました:
for i, row in df.iterrows():
passed_items = []
for cell in row:
if cell in passed_items:
print("need to create a new line")
answers = {f"Answer{i}": passed_items[i] for i in range(0, len(passed_items))} # dyanmically allocate to place them in the right columns
dict_replacing = {'Questions': questions, **answers} # dictionary that will replace the forle create the new lines
df1 = pd.DataFrame(dict_replacing)
df = df1.combine_first(df)
passed_items = []
passed_items.append(str(cell))
しかし、それは私に戻ってきます:
Answer_2 Answer_9 Answer_0 Answer_1 Answer_10 Answer_11 Answer_12 Answer_13 Answer_14 Answer_2 Answer_3 Answer_4 Answer_5 Answer_6 Answer_7 Answer_8 QID QType Questions
0 NaN NaN Very important consideration Important consideration NaN NaN NaN NaN NaN Somewhat consider Not an important consideration Do not consider Baggage policy Discounted flights NaN NaN NaN NaN The airline/company you fly with
1 NaN NaN Very important consideration Important consideration NaN NaN NaN NaN NaN Somewhat consider Not an important consideration Do not consider Baggage policy Discounted flights NaN NaN NaN NaN The departure airport
2 NaN NaN Very important consideration Important consideration NaN NaN NaN NaN NaN Somewhat consider Not an important consideration Do not consider Baggage policy Discounted flights NaN NaN NaN NaN Duration of flight/route
3 NaN NaN Very important consideration Important consideration NaN NaN NaN NaN NaN Somewhat consider Not an important consideration Do not consider Baggage policy Discounted flights NaN NaN NaN NaN Price
4 NaN NaN Very important consideration Important consideration NaN NaN NaN NaN NaN Somewhat consider Not an important consideration Do not consider Baggage policy Discounted flights NaN NaN NaN NaN Baggage policy
5 NaN NaN Very important consideration Important consideration NaN NaN NaN NaN NaN Somewhat consider Not an important consideration Do not consider Baggage policy Discounted flights NaN NaN NaN NaN Environmental impacts
1263 Important consideration Somewhat consider No... Do not consider NaN Very important consideration Baggage policy Very important consideration Important consideration Somewhat consider Not an important considera... Do not consider NaN Do not consider Discounted flights Very important consideration Important consideration Somewhat consider Not an important consideration 1177.0 Likert Scale The travel restrictions of COVID-19 have been ...
列の順序は尊重されず、一部は二重になっています。
私はロブ・レイモンドの答えを理解しようとしてきました。
わからない:
for i in range(3, len(r)-len(repeat)):
データフレームの最後の列まで各列を繰り返しますか?df.apply(lambda r: getquestions(r), axis=1).explode("Questions")
:w3ressourceによると:
explode()関数は、リストの各要素を行に変換し、インデックス値を複製するために使用されます。
rを私が探していたリストに変換するのはこのことですか?
これが私のコメント付きのコードです:
import collections
df = pd.DataFrame({"QID":[1177],"Questions":["The travel restrictions of COVID-19 have been lifted and you are looking to book a flight. To what extent are the following factors considerations in your choice of flight?"],"QType":["Likert Scale"],"Answer0":["Very important consideration"],"Answer1":["Important consideration"],"Answer2":["Somewhat consider"],"Answer3":["Not an important consideration"],"Answer4":["Do not consider"],"Answer5":["Discounted flights"],"Answer6":["Very important consideration"],"Answer7":["Important consideration"],"Answer8":["Somewhat consider"],"Answer9":["Not an important consideration"],"Answer10":["Do not consider"],"Answer11":["Baggage policy"],"Answer12":["Very important consideration"],"Answer13":["Important consideration"],"Answer14":["Somewhat consider"],"Answer15":["Not an important consideration"],"Answer16":["Do not consider"],"Answer17":["Price of flights"],"Answer18":["Very important consideration"],"Answer19":["Important consideration"],"Answer20":["Somewhat consider"],"Answer21":["Not an important consideration"],"Answer22":["Do not consider"],"Answer23":["Insurance"],"Answer24":["Very important consideration"],"Answer25":["Important consideration"],"Answer26":["Somewhat consider"],"Answer27":["Not an important consideration"],"Answer28":["Do not consider"],"Answer29":["Airport services"],"Answer30":["Very important consideration"],"Answer31":["Important consideration"],"Answer32":["Somewhat consider"],"Answer33":["Not an important consideration"],"Answer34":["Do not consider"],"Answer35":["Environmental impact"],"Answer36":["Very important consideration"],"Answer37":["Important consideration"],"Answer38":["Somewhat consider"],"Answer39":["Not an important consideration"],"Answer40":["Do not consider"],"Answer41":["In-flight service"],"Answer42":["Very important consideration"],"Answer43":["Important consideration"],"Answer44":["Somewhat consider"],"Answer45":["Not an important consideration"],"Answer46":["Do not consider"],"Answer47":["Customer support"],"Answer48":["Very important consideration"],"Answer49":["Important consideration"],"Answer50":["Somewhat consider"],"Answer51":["Not an important consideration"],"Answer52":["Do not consider"],"Answer53":["Overcrowding on aircraft/airports"],"Answer54":["Very important consideration"],"Answer55":["Important consideration"],"Answer56":["Somewhat consider"],"Answer57":["Not an important consideration"],"Answer58":["Do not consider"],"Answer59":["Airport safety after COVID-19"],"Answer60":["Very important consideration"],"Answer61":["Important consideration"],"Answer62":["Somewhat consider"],"Answer63":["Not an important consideration"],"Answer64":["Do not consider"],"Answer65":["Refund policy"]})
def getquestions(r):
# counter
repeat = list({k:v for k,v in collections.Counter(r[3:].values).items() if v>1}) # get all the questions
questions = []
firstfound = 0
#
for i in range(3, len(r)-len(repeat)): # I don't get this one
if r[i:i+len(repeat)].tolist()==repeat: # I think we are trying to get the subset that are repeat
if r[i+len(repeat):i+len(repeat)+1].values[0] is not None: # here we get the question
questions.append(r[i+len(repeat):i+len(repeat)+1].values[0]) # we store it
if firstfound==0: firstfound = i+len(repeat) # so when it's not 0, we do not update? Why? Why is this thing for?
if len(questions) > 0: #weird cases ?
# somethong odd, sometimes it's a list other times a str
newq = r[1] + questions if isinstance(r[1], list) else [r[1]] + questions
r[1] = newq
# reset all the questions that have been used by list
for i in range(firstfound, len(r)):
if isinstance(r[i], str): r[i] = None
return r
def fixqid(c):
return [id if i==0 or c[i-1]!=id else f"{id}_{i}" for i, id in enumerate(c)]
df = df.apply(lambda r: getquestions(r), axis=1).explode("Questions").reset_index().drop("index", 1) # what does explode stands for?
df["QID"] = fixqid(df["QID"].values)
df
洗練された変革
explode()
collections.Counter()
にキー一意の値に回答#1列1回だけ発生することがものを排除これにより、この例では実際に10個の埋め込まれた質問が見つかります。賢明にフォーマットするには幅が広いので、出力が含まれていることに注意しました
import collections
df = pd.DataFrame({"QID":[1177,"1177R"],"Questions":["The travel restrictions of COVID-19 have been lifted and you are looking to book a flight. To what extent are the following factors considerations in your choice of flight?","How would you like to book your next holiday?"],"QType":["Likert Scale","Likert Scale"],"Answer0":["Very important consideration","Airline XYZ app"],"Answer1":["Important consideration","Airline XYZ website"],"Answer2":["Somewhat consider","Third party website"],"Answer3":["Not an important consideration","Third party app"],"Answer4":["Do not consider","Travel agent"],"Answer5":["Discounted flights","Call"],"Answer6":["Very important consideration",""],"Answer7":["Important consideration",""],"Answer8":["Somewhat consider",""],"Answer9":["Not an important consideration",""],"Answer10":["Do not consider",""],"Answer11":["Baggage policy",""],"Answer12":["Very important consideration",""],"Answer13":["Important consideration",""],"Answer14":["Somewhat consider",""],"Answer15":["Not an important consideration",""],"Answer16":["Do not consider",""],"Answer17":["Price of flights",""],"Answer18":["Very important consideration",""],"Answer19":["Important consideration",""],"Answer20":["Somewhat consider",""],"Answer21":["Not an important consideration",""],"Answer22":["Do not consider",""],"Answer23":["Insurance",""],"Answer24":["Very important consideration",""],"Answer25":["Important consideration",""],"Answer26":["Somewhat consider",""],"Answer27":["Not an important consideration",""],"Answer28":["Do not consider",""],"Answer29":["Airport services",""],"Answer30":["Very important consideration",""],"Answer31":["Important consideration",""],"Answer32":["Somewhat consider",""],"Answer33":["Not an important consideration",""],"Answer34":["Do not consider",""],"Answer35":["Environmental impact",""],"Answer36":["Very important consideration",""],"Answer37":["Important consideration",""],"Answer38":["Somewhat consider",""],"Answer39":["Not an important consideration",""],"Answer40":["Do not consider",""],"Answer41":["In-flight service",""],"Answer42":["Very important consideration",""],"Answer43":["Important consideration",""],"Answer44":["Somewhat consider",""],"Answer45":["Not an important consideration",""],"Answer46":["Do not consider",""],"Answer47":["Customer support",""],"Answer48":["Very important consideration",""],"Answer49":["Important consideration",""],"Answer50":["Somewhat consider",""],"Answer51":["Not an important consideration",""],"Answer52":["Do not consider",""],"Answer53":["Overcrowding on aircraft/airports",""],"Answer54":["Very important consideration",""],"Answer55":["Important consideration",""],"Answer56":["Somewhat consider",""],"Answer57":["Not an important consideration",""],"Answer58":["Do not consider",""],"Answer59":["Airport safety after COVID-19",""],"Answer60":["Very important consideration",""],"Answer61":["Important consideration",""],"Answer62":["Somewhat consider",""],"Answer63":["Not an important consideration",""],"Answer64":["Do not consider",""],"Answer65":["Refund policy",""]})
def getquestions(r):
repeat = list({k:v for k,v in collections.Counter(r[3:].values).items() if v>1 and isinstance(k, str)})
if len(repeat)<3: return r
questions = []
firstfound = 0
for i in range(3, len(r)-len(repeat)):
if r[i:i+len(repeat)].tolist()==repeat:
if r[i+len(repeat):i+len(repeat)+1].values[0] is not None:
questions.append(r[i+len(repeat):i+len(repeat)+1].values[0])
if firstfound==0: firstfound = i+len(repeat)
if len(questions) > 0:
# somethong odd, sometimes it's a list other times a str
newq = r[1] + questions if isinstance(r[1], list) else [r[1]] + questions
r[1] = newq
# reset all the questions that have been used by list
for i in range(firstfound, len(r)):
if isinstance(r[i], str): r[i] = np.nan
return r
def fixqid(c):
qid = []
sub = 0
for i, id in enumerate(c):
if i==0 or c[i-1]!=id:
sub=0
qid.append(id)
else:
sub +=1
qid.append(f"{id}_{sub}")
return qid
df = df.apply(lambda r: getquestions(r), axis=1).explode("Questions").reset_index().drop("index", 1)
df["QID"] = fixqid(df["QID"].values)
df.iloc[:,:10]
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加