我有一个凌乱的数据框,我试图在其中“标记”该列中包含特定数字的ids
行。此列中的值表示一个包含范围:例如,“第4行”包含以下数字:2409,2410,2411,2412,2413,2414,2377,2378,1478,1479,1480,1481,1482,1483, 1484
在“第0行”和“第1行”中,其中一组的范围向后(1931,1930,1929)
例如,如果我想知道哪些行的集合包含“ 2340”和“ 1930”,我该怎么做?我认为需要循环,有时需要查询的不仅仅是两个数字。使用Python 3.8。
示例数据框
x = ['1331:1332,1552:1551,1931:1928,1965:1973,1831:1811,1927:1920',
'1331:1332,1552:1551,1931:1929,180:178,1966:1973,1831:1811,1927:1920',
'2340:2341,1142:1143,1594:1593,1597:1596,1310,1311',
'2339:2341,1142:1143,1594:1593,1597:1596,1310:1318,1977:1974',
'2409:2414,2377:2378,1478:1484',
'2474:2476',
]
y = [6.48,7.02,7.02,6.55,5.99,6.39,]
df = pd.DataFrame(list(zip(x, y)), columns =['ids', 'val'])
display(df)
我将编写一个执行2个步骤的函数:
ids_num_list
query_id
存在ids_num_list
def check_num_in_ids_string(ids_string, query_id):
# Convert ids_string to ids_num_list
ids_range_list = ids_string.split(',')
ids_num_list = set()
for ids_range in ids_range_list:
if ':' in ids_range:
lower, upper = sorted(ids_range.split(":"))
num_list = list(range(int(lower), int(upper)+ 1))
ids_num_list.update(num_list)
else:
ids_num_list.add(int(ids_range))
# Check if query number is in the list
if int(query_id) in ids_num_list:
return 1
else:
return 0
# Example usage
query_id_list = ['2340', '1930']
for query_id in query_id_list:
df[f'n{query_id}'] = (
df['ids']
.apply(lambda x : check_num_in_ids_string(x, query_id))
)
这将返回您的要求:
ids val n2340 n1930
0 1331:1332,1552:1551,1931:1928,1965:1973,1831:1... 6.48 0 1
1 1331:1332,1552:1551,1931:1929,180:178,1966:197... 7.02 0 1
2 2340:2341,1142:1143,1594:1593,1597:1596,1310,1311 7.02 1 0
3 2339:2341,1142:1143,1594:1593,1597:1596,1310:1... 6.55 1 0
4 2409:2414,2377:2378,1478:1484 5.99 0 0
5 2474:2476 6.39 0 0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句