How to check value of a column lies between values of two columns in other file and print corresponding value from column in Unix?

BioFreak

I have two files to be compared. I found how to compare columns and print according to condition. My problem at hand is that I have to check if the value of column[2] in file1 lies between value in file2 defined as a range in two columns, col [2] col[3]. If that is true, then I should print column[4] of file 2 in my file1.

file1:

scaffold1_size11    12
scaffold2_size22    26
scaffold3_size33    67

file2:

scaffold1_size11    1   10  Os01
scaffold1_size11    12  20  Os08
scaffold1_size11    29  59  Os07
scaffold2_size22    17  24  Os09
scaffold2_size22    27  38  Os09
scaffold2_size22    39  60  Os10
scaffold2_size22    67  78  Os10
scaffold3_size33    15  27  Os03
scaffold3_size33    29  62  Os08
scaffold3_size33    64  78  Os02
scaffold3_size33    80  98  Os01

desired output:

scaffold1_size11    12  Os08
scaffold2_size22    26
scaffold3_size33    67  Os02

How should this be done?

Jonathan Leffler

There's a standard idiom in awk which uses FNR (file record number) and NR (overall record number) to detect when you're reading the first file. You read and save the values of the first file in arrays, and then use the arrays while reading the second file.

In this context, you want to read file1 first, saving the records based on the value in column 1 ($1). This assumes that the keys in file1 (the first field) are unique. Then, when reading the second file,

awk 'FNR == NR { val[$1] = $2 }
     FNR != NR { if ($1 in val && val[$1] >= $2 && val[$1] <= $3)
                     print $1, val[$1], $4
               }' file1 file2

Sample output:

scaffold1_size11 12 Os08
scaffold2_size22 26 Os09
scaffold3_size33 67 Os02

Note that this is different from the sample output in the question, which is:

scaffold1_size11    12  Os08
scaffold2_size22    26
scaffold3_size33    67  Os02

I assume that's a typo in the question since none of the rows in file2 is missing the fourth column.

You'll also see the idiom used like:

awk 'FNR == NR { …save…; next }
     { …process… }'

The next skips the second block of code while reading the first file. It might be marginally more efficient, but I tend to like the explicit clarity of the two inverted conditions.

If spacing in the output is an issue, use an appropriate printf statement in place of the print.

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

분류에서Dev

Reading thru CSV file, finding columns with same value and appending their other column values

분류에서Dev

Unix: Print extra column of values to file

분류에서Dev

Select column value that matches a combination of other columns values on the same table

분류에서Dev

Group a dataframe on one column and take max from one column and its corresponding value from the other col

분류에서Dev

column with value as Concatenate other columns' value

분류에서Dev

Need to print value from 12th column of a particular row that matches two variables in a CSV file

분류에서Dev

Ranking - Select corresponding column values to a MAX(column) value

분류에서Dev

Return the occurance of a value in a column based on other values

분류에서Dev

Merge two columns in R while replacing when value is not present in other column

분류에서Dev

How to change value of column with different percentage values depending on the categories of other column

분류에서Dev

How to get count of particular column value from total number of records and display difference in two different columns in SQL Server

분류에서Dev

Search for column values in another column and assign a value from the next column from the row found to another column

분류에서Dev

Copy unique values from two columns into a third column

분류에서Dev

SQL Update a column dependent on other two columns

분류에서Dev

How to get dataset column value from objectdatasource

분류에서Dev

SQL SELECT Data from other table based on column value

분류에서Dev

delete three columns of two grouped columns according to common value of two column

분류에서Dev

Getting id values from a column and then get value from another table

분류에서Dev

Searching an array for a value, then printing corresponding values from other arrays/ndarrays of the same length

분류에서Dev

Summing values from a column based on match in another column and first distinct occurrence of value in a third column

분류에서Dev

What is wrong with this Numpy/Pandas code to construct new boolean column based on the values in two other boolean columns?

분류에서Dev

How to select a column value based on two row conditions?

분류에서Dev

Seperate value in a Column to different Segments or Columns in SSMS

분류에서Dev

How to reduce the value of a column in a row with?

분류에서Dev

how to get the average of values for one column based on another column value in python (pandas, jupyter)

분류에서Dev

Group Values in excel according duplicate value in column

분류에서Dev

Add column to Data Frame based on values of other columns

분류에서Dev

How to find a value in one column and replace a value in different column in excel?

분류에서Dev

How can I get a column of value from hashmap

Related 관련 기사

  1. 1

    Reading thru CSV file, finding columns with same value and appending their other column values

  2. 2

    Unix: Print extra column of values to file

  3. 3

    Select column value that matches a combination of other columns values on the same table

  4. 4

    Group a dataframe on one column and take max from one column and its corresponding value from the other col

  5. 5

    column with value as Concatenate other columns' value

  6. 6

    Need to print value from 12th column of a particular row that matches two variables in a CSV file

  7. 7

    Ranking - Select corresponding column values to a MAX(column) value

  8. 8

    Return the occurance of a value in a column based on other values

  9. 9

    Merge two columns in R while replacing when value is not present in other column

  10. 10

    How to change value of column with different percentage values depending on the categories of other column

  11. 11

    How to get count of particular column value from total number of records and display difference in two different columns in SQL Server

  12. 12

    Search for column values in another column and assign a value from the next column from the row found to another column

  13. 13

    Copy unique values from two columns into a third column

  14. 14

    SQL Update a column dependent on other two columns

  15. 15

    How to get dataset column value from objectdatasource

  16. 16

    SQL SELECT Data from other table based on column value

  17. 17

    delete three columns of two grouped columns according to common value of two column

  18. 18

    Getting id values from a column and then get value from another table

  19. 19

    Searching an array for a value, then printing corresponding values from other arrays/ndarrays of the same length

  20. 20

    Summing values from a column based on match in another column and first distinct occurrence of value in a third column

  21. 21

    What is wrong with this Numpy/Pandas code to construct new boolean column based on the values in two other boolean columns?

  22. 22

    How to select a column value based on two row conditions?

  23. 23

    Seperate value in a Column to different Segments or Columns in SSMS

  24. 24

    How to reduce the value of a column in a row with?

  25. 25

    how to get the average of values for one column based on another column value in python (pandas, jupyter)

  26. 26

    Group Values in excel according duplicate value in column

  27. 27

    Add column to Data Frame based on values of other columns

  28. 28

    How to find a value in one column and replace a value in different column in excel?

  29. 29

    How can I get a column of value from hashmap

뜨겁다태그

보관