这可能存在于其他地方,但我找不到它。我的目标是从爆炸搜索中删除多余的数字,以提取序列数据,同时保留数字序列ID。例如
原始:
>k141_100041 flag=0 multi=242.9841 len=43238
Sbjct 16375 MSEELTQNSGSNYSASSIQVLEGLEAVRKRPAMYIGDISEKGLHHLVYEVVDNSIDEALA 16196
Sbjct 16195 GYCTHIEVTINEDNSITVQDNGRGIPVDFHEKEKKSALEVVMTVLHAGGKFDKGSYKVSG 16016
Sbjct 16015 GLHGVGVSCVNALSTHMTTNVFRNGKIYQQEYECGKPLYAVKEVGTTDITGTRQTFWPDG 15836
Sbjct 15835 SIFTVTEYKYSILQARMRELAYLNKGITITLTDKRVKEEDGSYKQEKFHSEEGVKEFVRF 15656
Sbjct 15655 LNSNNTPLIDDVIYLNTEKQGIPIECAIMYNTGFRENLHSYVNNINTIEGGTHEAGFRMA 15476
Sbjct 15475 LTRVLKKYAEESKALEKAKVEISGEDFREGLIAVISVKVSEPQFEGQTKTKLGNNEVSGA 15296
Sbjct 15295 VNQAVGEALTYYLEEHPKEAKIIVDKVVLAATARVAARKARESVQRKSPMGGGGLPGKLA 15116
Sbjct 15115 DCSSRVAEECELFLVEGDSAGGSAKQGRSRQFQAILPLRGKILNVEKAMWHKAFESDDVN 14936
Sbjct 14935 NIIQALGVRFGVDGEEDSKKANIDKLRYHKVIIMTDADVDGSHIDTLIMTLFYRYMPEVI 14756
Sbjct 14755 QGGHLYIATPPLYKCSKGKISEYCYTDEARQAFIQKYGEGNEQGIHTQRYKGLGEMNPEQ 14576
Sbjct 14575 LWETTMNPETRILKQVNIENAAEADYIFSMLMGDDVGPRREFIEKNATYANIDA 14414
目标:
>k141_112817 flag=0 multi=66.5284 len=335023
MSEELTQNSGSNYSASSIQVLEGLEAVRKRPAMYIGDISEKGLHHLVYEVVDNSIDEALA
GYCTHIEVTINEDNSITVQDNGRGIPVDFHEKEKKSALEVVMTVLHAGGKFDKGSYKVSG
GLHGVGVSCVNALSTHMTTNVFRNGKIYQQEYECGKPLYAVKEVGTTDITGTRQTFWPDG
SIFTVTEYKYSILQARMRELAYLNKGITITLTDKRVKEEDGSYKQEKFHSEEGVKEFVRF
LNSNNTPLIDDVIYLNTEKQGIPIECAIMYNTGFRENLHSYVNNINTIEGGTHEAGFRMA
LTRVLKKYAEESKALEKAKVEISGEDFREGLIAVISVKVSEPQFEGQTKTKLGNNEVSGA
VNQAVGEALTYYLEEHPKEAKIIVDKVVLAATARVAARKARESVQRKSPMGGGGLPGKLA
DCSSRVAEECELFLVEGDSAGGSAKQGRSRQFQAILPLRGKILNVEKAMWHKAFESDDVN
NIIQALGVRFGVDGEEDSKKANIDKLRYHKVIIMTDADVDGSHIDTLIMTLFYRYMPEVI
QGGHLYIATPPLYKCSKGKISEYCYTDEARQAFIQKYGEGNEQGIHTQRYKGLGEMNPEQ
LWETTMNPETRILKQVNIENAAEADYIFSMLMGDDVGPRREFIEKNATYANIDA
我可以使用sed命令轻松删除'Sbjct'行和数字,但是我不知道如何从sed命令中免除id行(k141_112817 ...)。任何帮助都将被申请。
我认为这sed
是错误的工具,因为您似乎想要:
awk '/^Sbjct/{$0 = $3}1' input-file
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句