How to compare two column of two file and print the number of match with awk

nstatam

I have a data file A.tsv (field separator = \t) :

id  clade   mutation
243 40A titi,toto,lala
254     
267 40B lala,jiji,jojo

and a template file B.tsv (field separator = \t) :

40A lala,toto,xixi,xaxa
40B xaxa,jojo,huhu
40C sasa,sisi,lala

Based on their common column (clade), I want to compare the mutation of A.tsv from the template B.tsv and indicate the number of match that it found in a new column in a new file (C.tsv) like this:

id  clade   mutation    number
243 40A titi,toto,lala  2
254     
267 40B lala,jiji,jojo  1

I know how to compare two files like this:

awk -F"," -vOFS="," '    
    NR==FNR {
     a[$2]=$3;
     next
    }
    
    { print $0,a[$2] }
' B.tsv A.tsv > C.tsv

but I don't know how to count the match. Do you have an idea?

A SECOND QUESTION:

I'm wondering how to make a new column with only the information on how many mutations are present in B.tsv.  Example for the column total_mut in C.tsv:

id  clade   mutation    number  total_mut
243 40A titi,toto,lala  2   4
254     
267 40B lala,jiji,jojo  1   3
Paul_Pedant

The method is to make an array indexed by clade and mutation, from the B file. Then iterate the mutations from the A file.

Somewhat tricky to deal with a tab-separated file, especially keeping the number of columns where there is no clade.

We define the necessary column numbers for the A file as cClade and cMut, and changed these to match the full data format.

For the follow-up question, we save nMut (number of mutations), which split() already returns, and add it to the prints (header and detail). Tested this version too.

#! /bin/bash

Match () {  #:: (data, template)

    Awk='
BEGIN { FS = "\t"; Sep = ","; cClade = 20; cMut = 41; }
F == "B" {
    nMut[$1] = split ($2, V, Sep);
    for (j in V) Mut[$1 Sep V[j]];
    next;
}
! $2 { printf ("%s%s%s\n", $0, FS, FS); next; }
FNR == 1 { printf ("%s%s%s%s%s\n", $0, FS, "number", FS, "total_mut"); next; }
{
    n = 0;
    split ($cMut, V, Sep);
    for (j in V) if (($cClade Sep V[j]) in Mut) ++n;
    printf ("%s%s%s%s%s\n", $0, FS, n, FS, nMut[$cClade]);
}
'
    awk -f <( printf '%s' "${Awk}" ) F="B" "${2}" F="A" "${1}"
}

    Match useTemplate.A.tsv useTemplate.B.tsv > useTemplate.C.tsv

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to compare two column of two file and print the number of match with awk

From Dev

How to compare two column of two file and print not matching pattern with awk

From Dev

How to compare two column of two file and print not matching pattern with awk

From Dev

how to compare two text file with first column if match then print same if not then put zero?

From Dev

AWK Compare two files and add new column from second file to first file if match

From Dev

AWK Column Match in Two Files, Print Different Column

From Dev

Compare two files with the third column in each file matching but the second fith columns do not match using awk

From Dev

Compare first and second column of two files and print the row from second file if there is a match

From Dev

awk compare two files and print first field in file 1

From Dev

how to compare two fields in a file and print the one that has data using awk?

From Dev

How to join two file with a key column in awk

From Dev

Compare two files and print matches in the first file adding extra column

From Dev

compare and print the values in two arrays using awk

From Dev

how to compare two files and print mismatched line number in python?

From Dev

AWK print the column number where a match is found

From Dev

AWK print the column number where a match is found

From Dev

Compare two files column values using awk

From Dev

Compare two arrays and print number of matching elements

From Dev

compare column of two files and print data accordingly

From Dev

Compare two text files with two columns, find matches in first column, output match to third file

From Dev

Compare two fields of two files and print if they do not match

From Dev

Compare two fields of two files and print if they do not match

From Dev

compare four column and print out column with smallest number in awk

From Dev

Compare second column of two text files and print first columns of both files if match

From Dev

awk compare 2 files, print match and nonmatch lines;3rd column of first file and 2nd column of second file

From Dev

How to merge one column from two different file using awk

From Dev

awk to match, merge two files while modifying column input and adding an extra column to the output file

From Dev

awk to match, merge two files while modifying column input and adding an extra column to the output file

From Dev

AWK: How to Compare Two Variables with Regular Expression

Related Related

  1. 1

    How to compare two column of two file and print the number of match with awk

  2. 2

    How to compare two column of two file and print not matching pattern with awk

  3. 3

    How to compare two column of two file and print not matching pattern with awk

  4. 4

    how to compare two text file with first column if match then print same if not then put zero?

  5. 5

    AWK Compare two files and add new column from second file to first file if match

  6. 6

    AWK Column Match in Two Files, Print Different Column

  7. 7

    Compare two files with the third column in each file matching but the second fith columns do not match using awk

  8. 8

    Compare first and second column of two files and print the row from second file if there is a match

  9. 9

    awk compare two files and print first field in file 1

  10. 10

    how to compare two fields in a file and print the one that has data using awk?

  11. 11

    How to join two file with a key column in awk

  12. 12

    Compare two files and print matches in the first file adding extra column

  13. 13

    compare and print the values in two arrays using awk

  14. 14

    how to compare two files and print mismatched line number in python?

  15. 15

    AWK print the column number where a match is found

  16. 16

    AWK print the column number where a match is found

  17. 17

    Compare two files column values using awk

  18. 18

    Compare two arrays and print number of matching elements

  19. 19

    compare column of two files and print data accordingly

  20. 20

    Compare two text files with two columns, find matches in first column, output match to third file

  21. 21

    Compare two fields of two files and print if they do not match

  22. 22

    Compare two fields of two files and print if they do not match

  23. 23

    compare four column and print out column with smallest number in awk

  24. 24

    Compare second column of two text files and print first columns of both files if match

  25. 25

    awk compare 2 files, print match and nonmatch lines;3rd column of first file and 2nd column of second file

  26. 26

    How to merge one column from two different file using awk

  27. 27

    awk to match, merge two files while modifying column input and adding an extra column to the output file

  28. 28

    awk to match, merge two files while modifying column input and adding an extra column to the output file

  29. 29

    AWK: How to Compare Two Variables with Regular Expression

HotTag

Archive