awk: change whitespaces with tabs taking titles and empty fields into account

DaniCee

I have a space-separated file that I want to turn into a tab-separated file. The file looks like this:

pos    peptide      logscore affinity(nM) Bind Level    Protein Name     Allele
0   GPSGGQPX         0.075        22266                          1 HLA-A11:01
0   PSGGQPXA         0.071        23285                          2 HLA-A11:01
0   SGGQPXAL         0.076        21945                          3 HLA-A11:01
0   GGQPXALD         0.076        21858                          4 HLA-A11:01
0   GQPXALDS         0.075        22237                          5 HLA-A11:01
0   QPXALDSG         0.073        22748                          6 HLA-A11:01
0   PXALDSGY         0.072        22962                          7 HLA-A11:01
0   XALDSGYD         0.080        21133                          8 HLA-A11:01
0   DTSMKDMH         0.093        18194                          9 HLA-A11:01
0   TSMKDMHK         0.732           18         SB              10 HLA-A11:01
0   SMKDMHKV         0.099        17148                         11 HLA-A11:01
0   MKDMHKVL         0.071        23175                         12 HLA-A11:01
0   KDMHKVLR         0.135        11550                         13 HLA-A11:01
0   DMHKVLRT         0.074        22537                         14 HLA-A11:01
0   MHKVLRTL         0.072        23056                         15 HLA-A11:01
0   HKVLRTLQ         0.069        23819                         16 HLA-A11:01
0   DTSMKDMH         0.093        18194                         17 HLA-A11:01
0   TSMKDMHK         0.732           18         SB              18 HLA-A11:01
0   SMKDMHKV         0.099        17148                         19 HLA-A11:01
0   MKDMHKVL         0.071        23175                         20 HLA-A11:01

I have to replace whitespaces with a single tab, taking into account:

  1. no tab in "Bind Name" and in "Protein Name" in the title line; they should be renamed "Bind.Level" and "Protein.Name" instead.
  2. 2 tabs instead of just 1 (or a "-" or NA in the middle) between the affinity field and the Protein.Name field in those entries where Bind.Level is empty, to preserve such empty entries in that field.

Hence, just the following isn't enough:

 awk '{$1=$1}1' OFS="\t" file

Is there a simple way to accomplish this with a one-liner, preferably awk?

EDIT:

This is how the output should look like, notice "Bind.Level" and "Protein.Name" in the title, and "-" (which can be NA or "") in the empty Bind.Level records

pos peptide logscore    affinity(nM)    Bind.Level  Protein.Name    Allele
0   GPSGGQPX    0.075   22266   -   1   HLA-A11:01
0   PSGGQPXA    0.071   23285   -   2   HLA-A11:01
0   SGGQPXAL    0.076   21945   -   3   HLA-A11:01
0   GGQPXALD    0.076   21858   -   4   HLA-A11:01
0   GQPXALDS    0.075   22237   -   5   HLA-A11:01
0   QPXALDSG    0.073   22748   -   6   HLA-A11:01
0   PXALDSGY    0.072   22962   -   7   HLA-A11:01
0   XALDSGYD    0.080   21133   -   8   HLA-A11:01
0   DTSMKDMH    0.093   18194   -   9   HLA-A11:01
0   TSMKDMHK    0.732   18  SB  10  HLA-A11:01
0   SMKDMHKV    0.099   17148   -   11  HLA-A11:01
0   MKDMHKVL    0.071   23175   -   12  HLA-A11:01
0   KDMHKVLR    0.135   11550   -   13  HLA-A11:01
0   DMHKVLRT    0.074   22537   -   14  HLA-A11:01
0   MHKVLRTL    0.072   23056   -   15  HLA-A11:01
0   HKVLRTLQ    0.069   23819   -   16  HLA-A11:01
0   DTSMKDMH    0.093   18194   -   17  HLA-A11:01
0   TSMKDMHK    0.732   18  SB  18  HLA-A11:01
0   SMKDMHKV    0.099   17148   -   19  HLA-A11:01
0   MKDMHKVL    0.071   23175   -   20  HLA-A11:01

Note that non-empty Bind.Level records might adopt different values, not just "SB"... but all of them alphabetic... Protein.Name might not always be numeric, though...

It would be something like identifying the fields separated by \s+; then, if there are 7 fields, print them as such (separated by tab), and if there are 6 (Bind.Level empty), print $1, $2, $3, $4, "-", $5, $6. Protein.names could potentially contain spaces, but I'm going to make sure it they don't (they are the input). That should be super simple, but I don't know how to do it... anyone??

DaniCee

Got it in 2 steps, a first step to add "-" in the empty Binding.Level records and "." in the proper title names, and a second step to change from whitespaces to tabs:

 awk 'BEGIN{FS="";OFS=FS};($50==" "){$50="-"};(NR==1){$47="."; $64="."}{print}' file > out1
 awk '{$1=$1}1' OFS="\t" out1 > out2

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

분류에서Dev

Not taking into account condition

분류에서Dev

How to center without taking into account a floated element?

분류에서Dev

Migrating custom Tabs from Demo account to production account

분류에서Dev

How change to administrator account from standard account

분류에서Dev

Automatically get the time taking day light saving hours into account

분류에서Dev

Streamtokenizer, whitespaces

분류에서Dev

How to change Gtk.Notebook Tabs

분류에서Dev

paper-tabs event listener to change selected?

분류에서Dev

paper-tabs event listener to change selected?

분류에서Dev

Change the order of navigation in my account page in magento

분류에서Dev

Is it possible to change the account information in Adobe Acrobat DC?

분류에서Dev

MySQL Null or empty fields - New Database Design

분류에서Dev

Javascript/ajax not sending empty fields to php

분류에서Dev

change and manipulate lines in a file using awk

분류에서Dev

Curl a string with whitespaces and characters in it?

분류에서Dev

How can I change the default google account in google chrome?

분류에서Dev

ASP.NET Identity - Allow a change password on a social account?

분류에서Dev

Ignore blank empty fields from update using php mysql

분류에서Dev

Nextjs hot reloading taking 8-10 secs on every change of tailwind css

분류에서Dev

Bash string manipulation with/without whitespaces

분류에서Dev

Regex in SQL Server to check whitespaces

분류에서Dev

how to change one column and add other columns in a row by awk

분류에서Dev

Using sed / awk to change words between two patterns

분류에서Dev

awk: keep records with the highest value that share a field, while ignoring other fields

분류에서Dev

Styling the titles in Vaadin Charts

분류에서Dev

jQuery DataTables - File becomes empty after dynamic change of filename

분류에서Dev

awk how to set record separator as multiple consecutive empty lines or lines only include space and/or tab characters?

분류에서Dev

jQuery tabs and sub tabs

분류에서Dev

Symfony2 form builder without class check if multiple fields are empty

Related 관련 기사

  1. 1

    Not taking into account condition

  2. 2

    How to center without taking into account a floated element?

  3. 3

    Migrating custom Tabs from Demo account to production account

  4. 4

    How change to administrator account from standard account

  5. 5

    Automatically get the time taking day light saving hours into account

  6. 6

    Streamtokenizer, whitespaces

  7. 7

    How to change Gtk.Notebook Tabs

  8. 8

    paper-tabs event listener to change selected?

  9. 9

    paper-tabs event listener to change selected?

  10. 10

    Change the order of navigation in my account page in magento

  11. 11

    Is it possible to change the account information in Adobe Acrobat DC?

  12. 12

    MySQL Null or empty fields - New Database Design

  13. 13

    Javascript/ajax not sending empty fields to php

  14. 14

    change and manipulate lines in a file using awk

  15. 15

    Curl a string with whitespaces and characters in it?

  16. 16

    How can I change the default google account in google chrome?

  17. 17

    ASP.NET Identity - Allow a change password on a social account?

  18. 18

    Ignore blank empty fields from update using php mysql

  19. 19

    Nextjs hot reloading taking 8-10 secs on every change of tailwind css

  20. 20

    Bash string manipulation with/without whitespaces

  21. 21

    Regex in SQL Server to check whitespaces

  22. 22

    how to change one column and add other columns in a row by awk

  23. 23

    Using sed / awk to change words between two patterns

  24. 24

    awk: keep records with the highest value that share a field, while ignoring other fields

  25. 25

    Styling the titles in Vaadin Charts

  26. 26

    jQuery DataTables - File becomes empty after dynamic change of filename

  27. 27

    awk how to set record separator as multiple consecutive empty lines or lines only include space and/or tab characters?

  28. 28

    jQuery tabs and sub tabs

  29. 29

    Symfony2 form builder without class check if multiple fields are empty

뜨겁다태그

보관