给定带有八列的制表符分隔文件:
22 51244237 rs575160859 C T 100 PASS AC=19;AF=0.00379393;AN=5008;NS=2504;DP=13345;EAS_AF=0;AMR_AF=0.0043;AFR_AF=0;EUR_AF=0.0099;SAS_AF=0.0061;AA=.|||;VT=SNP
我如何使用 bash 从第八列中的信息创建一个新的制表符分隔的文件:AF; EAS_AF; AMR_AF; AFR_AF; EUR_AF; SAS_AF 和相应的数值?
IE:
#AF EAS_AF AMR_AF AFR_AF EUR_AF SAS_AF
0.00379393 0 0.0043 0 0.0099 0.0061
我知道我可以用“;”分割第八列 (https://unix.stackexchange.com/questions/156919/splitting-a-column-using-awk)然后删除不需要的文本列和文本字符串(即“AF =”),但有没有更有效的方法去做这个?
谢谢
你能不能试试以下。
awk '
{
match($0,/AF[^;]*/)
af=substr($0,RSTART,RLENGTH)
match($0,/EAS_AF[^;]*/)
eas=substr($0,RSTART,RLENGTH)
match($0,/AMR_AF[^;]*/)
amr=substr($0,RSTART,RLENGTH)
match($0,/AFR_AF[^;]*/)
afr=substr($0,RSTART,RLENGTH)
match($0,/EUR_AF[^;]*/)
eur=substr($0,RSTART,RLENGTH)
match($0,/SAS_AF[^;]*/)
sas=substr($0,RSTART,RLENGTH)
VAL=af OFS ac OFS eas OFS amr OFS afr OFS eur OFS sas
split(VAL,array,"[= ]")
print array[1],array[4],array[6],array[8],array[10],array[12] ORS array[2],array[5],array[7],array[9],array[11],array[13]
}' Input_file | column -t
说明:在此处也添加对上述代码的说明。
awk '
{
match($0,/AF[^;]*/) ##Using match out of the box awk function for matching AF string till semi colon.
af=substr($0,RSTART,RLENGTH) ##creating variable named af whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/EAS_AF[^;]*/) ##Using match out of the box awk function for matching EAS_AF string till semi colon.
eas=substr($0,RSTART,RLENGTH) ##creating variable named eas whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/AMR_AF[^;]*/) ##Using match out of the box awk function for matching AMR_AF string till semi colon.
amr=substr($0,RSTART,RLENGTH) ##creating variable named amr whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/AFR_AF[^;]*/) ##Using match out of the box awk function for matching AFR_AF string till semi colon.
afr=substr($0,RSTART,RLENGTH) ##creating variable named afr whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/EUR_AF[^;]*/) ##Using match out of the box awk function for matching EUR_AF string till semi colon.
eur=substr($0,RSTART,RLENGTH) ##creating variable named eur whose value is substring of indexes of RSTART to till value of RLENGTH.
match($0,/SAS_AF[^;]*/) ##Using match out of the box awk function for matching SAS_AF string till semi colon.
sas=substr($0,RSTART,RLENGTH) ##creating variable named sas whose value is substring of indexes of RSTART to till value of RLENGTH.
VAL=af OFS ac OFS eas OFS amr OFS afr OFS eur OFS sas ##Creating variable VAL whose value is values of all above mentioned variables.
split(VAL,array,"[= ]") ##Using split function of awk to split it into array named array with delimiter space OR =.
print array[1],array[4],array[6],array[8],array[10],array[12] ORS array[2],array[5],array[7],array[9],array[11],array[13] ##Printing all array values as per OP.
af=ac=eas=amr=afr=eur=sas="" ##Nullifying all variables mentioned above.
}' Input_file | column -t ##Mentioning Input_file name here and passing awk output to column command to take output in TAB format.
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句