基本上我有2个文本文件。
文本文件A :(重复的字符串)
hg17_chr2_74388709_74389
hg17_chr5_137023651_1370
hg17_chr7_137880501_1378
hg17_chr5_137023651_1370
文字档B:
hg17_chrX_52804801_52805856
hg17_chr15_79056833_79057564
hg17_chr2_74388709_74389559
hg17_chr1_120098891_120099441
hg17_chr5_137023651_137024301
hg17_chr11_85997073_85997627
hg17_chr7_137880501_137881251
文件A用工具修整了,因此对于两个文件,每个字符串的前24个字符的匹配都可以发现完全相同。如何匹配两个文件并将结果输出到具有所需内容的新文件中:
hg17_chr2_74388709_74389559
hg17_chr5_137023651_137024301
hg17_chr7_137880501_137881251
hg17_chr5_137023651_137024301
只需打开一次文件即可轻松解决:
with open('file_a','r') as fa: # open file a --> read the files into lists
list_a = fa.read().splitlines()
with open('file_b','r') as fb: # open file b --> read the files into lists
list_b = fb.read().splitlines()
# get element in list_b if list_a contain the element(only first 24 characters)
match_list = [n for n in list_b if n[:24] in list_a]
with open('file_c','w+') as fc: # write the matching list to the new file
fc.write('\n'.join(match_list))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句