awk: change whitespaces with tabs taking titles and empty fields into account

debugcn 에 게시 Dev

DaniCee

I have a space-separated file that I want to turn into a tab-separated file. The file looks like this:

pos    peptide      logscore affinity(nM) Bind Level    Protein Name     Allele
0   GPSGGQPX         0.075        22266                          1 HLA-A11:01
0   PSGGQPXA         0.071        23285                          2 HLA-A11:01
0   SGGQPXAL         0.076        21945                          3 HLA-A11:01
0   GGQPXALD         0.076        21858                          4 HLA-A11:01
0   GQPXALDS         0.075        22237                          5 HLA-A11:01
0   QPXALDSG         0.073        22748                          6 HLA-A11:01
0   PXALDSGY         0.072        22962                          7 HLA-A11:01
0   XALDSGYD         0.080        21133                          8 HLA-A11:01
0   DTSMKDMH         0.093        18194                          9 HLA-A11:01
0   TSMKDMHK         0.732           18         SB              10 HLA-A11:01
0   SMKDMHKV         0.099        17148                         11 HLA-A11:01
0   MKDMHKVL         0.071        23175                         12 HLA-A11:01
0   KDMHKVLR         0.135        11550                         13 HLA-A11:01
0   DMHKVLRT         0.074        22537                         14 HLA-A11:01
0   MHKVLRTL         0.072        23056                         15 HLA-A11:01
0   HKVLRTLQ         0.069        23819                         16 HLA-A11:01
0   DTSMKDMH         0.093        18194                         17 HLA-A11:01
0   TSMKDMHK         0.732           18         SB              18 HLA-A11:01
0   SMKDMHKV         0.099        17148                         19 HLA-A11:01
0   MKDMHKVL         0.071        23175                         20 HLA-A11:01

I have to replace whitespaces with a single tab, taking into account:

no tab in "Bind Name" and in "Protein Name" in the title line; they should be renamed "Bind.Level" and "Protein.Name" instead.
2 tabs instead of just 1 (or a "-" or NA in the middle) between the affinity field and the Protein.Name field in those entries where Bind.Level is empty, to preserve such empty entries in that field.

Hence, just the following isn't enough:

 awk '{$1=$1}1' OFS="\t" file

Is there a simple way to accomplish this with a one-liner, preferably awk?

EDIT:

This is how the output should look like, notice "Bind.Level" and "Protein.Name" in the title, and "-" (which can be NA or "") in the empty Bind.Level records

pos peptide logscore    affinity(nM)    Bind.Level  Protein.Name    Allele
0   GPSGGQPX    0.075   22266   -   1   HLA-A11:01
0   PSGGQPXA    0.071   23285   -   2   HLA-A11:01
0   SGGQPXAL    0.076   21945   -   3   HLA-A11:01
0   GGQPXALD    0.076   21858   -   4   HLA-A11:01
0   GQPXALDS    0.075   22237   -   5   HLA-A11:01
0   QPXALDSG    0.073   22748   -   6   HLA-A11:01
0   PXALDSGY    0.072   22962   -   7   HLA-A11:01
0   XALDSGYD    0.080   21133   -   8   HLA-A11:01
0   DTSMKDMH    0.093   18194   -   9   HLA-A11:01
0   TSMKDMHK    0.732   18  SB  10  HLA-A11:01
0   SMKDMHKV    0.099   17148   -   11  HLA-A11:01
0   MKDMHKVL    0.071   23175   -   12  HLA-A11:01
0   KDMHKVLR    0.135   11550   -   13  HLA-A11:01
0   DMHKVLRT    0.074   22537   -   14  HLA-A11:01
0   MHKVLRTL    0.072   23056   -   15  HLA-A11:01
0   HKVLRTLQ    0.069   23819   -   16  HLA-A11:01
0   DTSMKDMH    0.093   18194   -   17  HLA-A11:01
0   TSMKDMHK    0.732   18  SB  18  HLA-A11:01
0   SMKDMHKV    0.099   17148   -   19  HLA-A11:01
0   MKDMHKVL    0.071   23175   -   20  HLA-A11:01

Note that non-empty Bind.Level records might adopt different values, not just "SB"... but all of them alphabetic... Protein.Name might not always be numeric, though...

It would be something like identifying the fields separated by \s+; then, if there are 7 fields, print them as such (separated by tab), and if there are 6 (Bind.Level empty), print $1, $2, $3, $4, "-", $5, $6. Protein.names could potentially contain spaces, but I'm going to make sure it they don't (they are the input). That should be super simple, but I don't know how to do it... anyone??

DaniCee

Got it in 2 steps, a first step to add "-" in the empty Binding.Level records and "." in the proper title names, and a second step to change from whitespaces to tabs:

 awk 'BEGIN{FS="";OFS=FS};($50==" "){$50="-"};(NR==1){$47="."; $64="."}{print}' file > out1
 awk '{$1=$1}1' OFS="\t" out1 > out2

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-06-1

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

awk: change whitespaces with tabs taking titles and empty fields into account

awk: change whitespaces with tabs taking titles and empty fields into account

Not taking into account condition

How to center without taking into account a floated element?

Migrating custom Tabs from Demo account to production account

How change to administrator account from standard account

Automatically get the time taking day light saving hours into account

Streamtokenizer, whitespaces

How to change Gtk.Notebook Tabs

paper-tabs event listener to change selected?

paper-tabs event listener to change selected?

Change the order of navigation in my account page in magento

Is it possible to change the account information in Adobe Acrobat DC?

MySQL Null or empty fields - New Database Design

Javascript/ajax not sending empty fields to php

change and manipulate lines in a file using awk

Curl a string with whitespaces and characters in it?

How can I change the default google account in google chrome?

ASP.NET Identity - Allow a change password on a social account?

Ignore blank empty fields from update using php mysql

Nextjs hot reloading taking 8-10 secs on every change of tailwind css

Bash string manipulation with/without whitespaces

Regex in SQL Server to check whitespaces

how to change one column and add other columns in a row by awk

Using sed / awk to change words between two patterns

awk: keep records with the highest value that share a field, while ignoring other fields

Styling the titles in Vaadin Charts

jQuery DataTables - File becomes empty after dynamic change of filename

awk how to set record separator as multiple consecutive empty lines or lines only include space and/or tab characters?

jQuery tabs and sub tabs

Symfony2 form builder without class check if multiple fields are empty