我有大量文件,所有文件都具有相同的格式。
line 1: Gene ID
line 2: chromosomal position
line 3 - x: names of genetic variants)
我只想选择包含至少5个变体的文件(即,总共至少包含10行的文件)。如果一个文件至少有5个变体,我想将减去前两行的内容写到一个新文件中。下面,我提供了两个示例输入文件foo1
和foo2
。
foo1:
echo {885743,4:139381:3783883,rs93487,rs82727,rs111} | tr " " "\n" > foo1
foo2:
echo {10432,1:3747548:2192993,rs10204,rs262222,rs436363,rs3636,rs9878,rs11856} | tr " " "\n" > foo2
所需的输出文件(在这种情况下只有1个文件,实际上会有多个单独的输出文件)foo2.checked
:,看起来像:
rs10204
rs262222
rs436363
rs3636
rs9878
rs11856
# for each file in the current directory you can refine the ls command to match
# only the files you want. or if in a script file pass in the file list
for file in *
do
# if the file has more than 10 lines.
if (( $(<"${file}" wc -l) > 10 )); then
# print line 3 to end of file and pipe it to a file with the same
# name as the input file with the added .checked at the end.
sed -n '3,$p' -- "${file}" > "${file}.checked"
fi
done
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句