我在“bus_rev”下面有数据框。我想对数据框进行子集化,以便我有偶数个记录,其中 good_reviews==True 和 good_reviews==False。谁能建议一种巧妙的方法来做到这一点?
Sample Data:
print(bus_rev[1:3])
user_id business_id stars_x \
1 CxDOIDnH8gp9KXzpBHJYXw XSiqtcVEsP6dLOL7ZA9OxA 4
2 CxDOIDnH8gp9KXzpBHJYXw v95ot_TNwTk1iJ5n56dR0g 3
address attributes \
1 522 Yonge Street {u'BusinessParking': {u'garage': False, u'stre...
2 1661 Denison Street {u'BusinessParking': {u'garage': False, u'stre...
categories city \
1 [Restaurants, Ramen, Japanese] Toronto
2 [Chinese, Seafood, Restaurants] Markham
hours is_open latitude \
1 {u'Monday': u'11:00-22:00', u'Tuesday': u'11:0... 1 43.663689
2 {} 0 43.834295
longitude name neighborhood postal_code \
1 -79.384200 Kenzo Ramen Downtown Core M4Y 1X9
2 -79.305282 Vince Seafood Restaurant & BBQ Milliken L3R 6E4
review_count stars_y state good_reviews
1 76 3.5 ON True
2 23 3.5 ON False
Code:
bus_rev['good_reviews'].value_counts()
Output:
False 482
True 168
Name: good_reviews, dtype: int64
要创建具有相等值的 DataFrame,您可以使用:
bus_revs_false = bus_revs[bus_revs['good_reviews'] == False]
bus_revs_false = bus_revs_false.iloc(:168,:)
bus_revs_true = bus_revs[bus_revs['good_reviews'] == True]
bus_revs_new = bus_revs_true.append(bus_revs_false)
在这种情况下, bus_revs_new 将是您的新 DataFrame 具有相同数量的 Trues 和 Falses。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句