我有一堆用制表符分隔的文本文件,例如以下文件
"gene_id" "Pattern1" "Pattern2" "Pattern3" "Pattern4" "Pattern5" "MAP" "PPDE"
"ENSG00000119771.13" 3.11528786599051e-18 2.52650109640992e-13 6.25109524320237e-09 0.345846257420197 0.654153736328455 "Pattern5" 1
"ENSG00000123700.4" 1.75016991626305e-36 3.98804090894939e-19 0.63423772228367 3.8159144080782e-21 0.36576227771633 "Pattern3" 1
"ENSG00000128567.15" 1.10722918612618e-23 7.62691311068806e-07 5.77031364194955e-06 5.13675840911147e-21 0.999993466995047 "Pattern5" 1
"ENSG00000130182.6" 9.75717082221716e-22 1.27675651077242e-12 0.469972541094369 1.13677117238758e-12 0.530027458903217 "Pattern5" 1
"ENSG00000131914.9" 3.1627489688037e-41 1.00274706758683e-22 0.0578584524816503 6.98718794692175e-22 0.94214154751835 "Pattern5" 1
现在我想将它们加入到一个文件中,这样我就可以
"gene_id" "Pattern5" "Pattern5" "Pattern5" "Pattern5" "Pattern5"
其中每一Pattern5
列都来自一个文件。
我尝试了一些东西
cut -f 6 <file>
和
paste <file1> <file2> ...
但我无法正确组合。
谢谢您的帮助!
更新:我尝试为您提供一个可测试的示例,作为此处的输入:
<file1>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 1 2 3 4 5
ENSG00000123700 1 2 3 4 5
ENSG00000128567 1 2 3 4 5
ENSG00000130182 1 2 3 4 5
ENSG00000131914 1 2 3 4 5
<file2>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 6 7 8 9 10
ENSG00000123700 6 7 8 9 10
ENSG00000128567 6 7 8 9 10
ENSG00000130182 6 7 8 9 10
ENSG00000131914 6 7 8 9 10
<file3>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 11 12 13 14 15
ENSG00000123700 11 12 13 14 15
ENSG00000128567 11 12 13 14 15
ENSG00000130182 11 12 13 14 15
ENSG00000131914 11 12 13 14 15
和所需的输出将是
gene_id Pattern5_file1 Pattern5_file2 Pattern5_file3
ENSG00000119771 5 10 15
ENSG00000123700 5 10 15
ENSG00000128567 5 10 15
ENSG00000130182 5 10 15
ENSG00000131914 5 10 15
UPDATE2:我尝试了Ed Morton的方法:
awk '
BEGIN { FS=OFS="\t" } FNR==1{ARGIND++}
{ genes[$1]; val[$1,ARGIND] = $5 }
END {
for (gene in genes) {
printf "%s%s", gene, OFS
for (file=1; file<=ARGIND; file++) {
printf "%s%s", val[gene,file], (file<ARGIND?OFS:ORS)
}
}
} ' $files
但是输出格式不正确:
ENSG00000128567 4 9 14
ENSG00000130182 4 9 14
ENSG00000119771 4 9 14
gene_id Pattern4 Pattern4 Pattern4
ENSG00000131914 4 9 14
ENSG00000123700 4 9 14
试试这个
#!/bin/bash
paste file1 file2 file3 | awk -v patternIdx=6 '
function printPattern(idx, isFirstLine) {
for (i = 1; i <= NF; ++i) {
if (i == 1)
printf "%s ", $i;
else if (isFirstLine && i % patternIdx == 0)
printf "%s_file%d ", $i, i / patternIdx;
else if (i % patternIdx == 0)
printf "%d ", $i;
}
printf "\n"
}
{
if (NR == 1)
printPattern(patternIdx, 1);
else
printPattern(patternIdx, 0);
}'
patternIdx是以下内容的列索引 Pattern5
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句