I posted something similar a while ago and I thought, the code provided could help in solving my problem, however unfortunately I am not able to adjust it to my needs: awk - compare files and print lines from both files
So, I have again 2 tab-separated files.
file_1.txt
apple 2.5 5 7.2
great 3.8 10 3.6
see 7.6 3 4.9
tree 5.4 11 5
back 8.9 2 2.1
file_2.txt
apple :::N
back :::ADJ
back :::N
around :::ADV
great :::ADJ
bee :::N
see :::V
tree :::N
The output should look like:
apple :::N 2.5 5 7.2
great :::ADJ 3.8 10 3.6
back :::ADJ 8.9 2 2.1
back :::N 8.9 2 2.1
see :::V 7.6 3 4.9
tree :::N 5.4 11 5
The difference to the other post is, that I just like to compare the first columns of file_1.txt and file_2.txt and then print the whole line of file_1.txt with column 2 of file_1.txt to the outfile. I do not care in which order $2 of file_2.txt is printed to the outfile, so the outfile could as well look like
back 8.9 2 2.1 :::N
back 8.9 2 2.1 :::V etc.
The problem are the duplicates in column1 as back here. Otherwise I could of course just use paste
. The problem with this `awk-command is, that it does not read column2 in the a array and if I tell it to print it, this is not possible of course.
awk 'NR==FNR {a[$1]; next} $1 in a {print $0, a[$2]}' OFS='\t' file_2.txt file_1.txt > outfile.txt
I am gladly appreciating any help! Sorry for the stupidity here also, seems that I am completely stumped.
If you have GNU awk
(available from the repository via package gawk
), which supports multi-dimensional arrays, you could do
gawk 'NR==FNR {a[$1][$2]++; next} $1 in a {for (x in a[$1]) print $0, x}' OFS="\t" file_2.txt file_1.txt
Ex.
$ gawk 'NR==FNR {a[$1][$2]++; next} $1 in a {for (x in a[$1]) print $0, x}' OFS="\t" file_2.txt file_1.txt
apple 2.5 5 7.2 :::N
great 3.8 10 3.6 :::ADJ
see 7.6 3 4.9 :::V
tree 5.4 11 5 :::N
back 8.9 2 2.1 :::ADJ
back 8.9 2 2.1 :::N
Otherwise, if output order is not important the easiest solution is probably to use the join
command instead:
$ join -t $'\t' <(sort file_1.txt) <(sort file_2.txt)
apple 2.5 5 7.2 :::N
back 8.9 2 2.1 :::ADJ
back 8.9 2 2.1 :::N
great 3.8 10 3.6 :::ADJ
see 7.6 3 4.9 :::V
tree 5.4 11 5 :::N
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments