同步两个矩阵的列

debugcn 发表于 Dev

夏森

请指导如何解决此问题。我有几对文件需要修改，以便它们具有相同顺序的相同公共列。

如果我的文件是File1和File2，如下所示

  R1 C1 C2 C3 C4
  R2 1 2 3 4 
  R3 5 6 7 8


  R6 C4 C3 C6 C7
  R7 9 10 11 12
  R8 13 14 15 16

我正在寻找mod_File1和mod_File2

这是我尝试过的

awk '
  FNR==1        {F++}
  F==1          {
        if (NR==1) 
        for (i=2;i<NF;i++) 
        {
        col1[$i];
        }
        next
        }
  F==2          {
        if (NR==1) 
        for (i=2;i<NF;i++) 
        {
        col2[$i];
        }
        next
        }
  F=3           {   NR==1 { 
            for (i=2;i<NF;i++)
                         if ($i in cols2)
                         c1[i];
                          }
                    NR>1 { for (j in c1)
                        print $j >> mod_file1
                 }
  F=4           {   NR==1 { 
            for (i=2;i<NF;i++)
                             if ($i in cols1)
                             c1[i];
                           }
                     NR>1 { for (j in c1)
                        print $j >> mod_file2
                 }
     ' file1 file1 file2 file2

清醒

它比看起来要复杂一些-可能有一个库可以做得更好（perl中有很多数学库）。

但这应该可以满足您的要求：

#!/usr/bin/perl

use strict;
use warnings;


#read file 1
open( my $file1, "<", "data1.txt" ) or die $!;

my $header_line = <$file1>;
chomp($header_line);
my ( $column1, @headers1 ) = split( ' ', $header_line );

my %results;
my %headers_in_file1 = map { $_ => 1 } @headers1;

for (<$file1>) {
    my ( $column, @values ) = split;
    my %these_results;
    @these_results{@headers1} = @values;
    $results{$column}         = \%these_results;
}
close ( $file1);


#read file 2
open( my $file2, "<", "data2.txt" ) or die $!;
$header_line = <$file2>;
chomp($header_line);
my ( $column2, @headers2 ) = split( ' ', $header_line );

my %results2;
my %headers_in_file2 = map { $_ => 1 } @headers2;

for (<$file2>) {
    my ( $column, @values ) = split;
    my %these_results;
    @these_results{@headers2} = @values;
    $results2{$column}        = \%these_results;
}
close ( $file2 );

#figure out the columns in both
my %in_both;
foreach my $header ( @headers1, @headers2 ) {
    if (    $headers_in_file1{$header}
        and $headers_in_file2{$header} )
    {
        $in_both{$header}++;
    }
}

#sort out headers for output. 
my @output_headers = sort keys %in_both;

print join( " ", $column1, @output_headers ), "\n";
foreach my $row ( sort keys %results ) {
    print $row, " ";
    for my $header (@output_headers) {
        print $results{$row}{$header}, " ";
    }
    print "\n";
}

print "Second\n";
print join( " ", $column2, @output_headers ), "\n";
foreach my $row ( sort keys %results2 ) {
    print $row, " ";
    for my $header (@output_headers) {
        print $results2{$row}{$header}, " ";
    }
    print "\n";
}