Perl和Regex-解析.csv中的值

ZeldaElf 发表于 Dev

塞尔达传说

我需要创建一个Perl脚本，该脚本读取给定文件夹中的最后修改文件（该文件始终为.csv）并解析其列中的值，以便可以将它们控制到mysql数据库中。

主要问题是：我需要将日期和小时分开，将国家和名称分开（CHN，DEU和JPN分别代表中国，德国和日本）。

它们像下面的示例一样聚集在一起：

"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

到目前为止，我可以分割线，但是我如何才能理解应该将插入其中""并由,其分隔的每个值插入到数组中？

my %date;
my %hour;
my %country;
my %name;
my %percentage_one;
my %percentage_two;

# Selects lastest file in the given directory
my $files = File::DirList::list('/home/cvna/IN/SCRIPTS/zabbix/roaming/tratamento_IAS/GPRS_IN', 'M');
my $file = $files->[0]->[13];

open(CONFIG_FILE,$file);
while (<CONFIG_FILE>){
    # Splits the file into various lines
    @lines = split(/\n/,$_);
    # For each line that i get...
    foreach my $line (@lines){
        # I need to split the values between , without the ""
        # And separating Hour from Date, and Name from Country
        @aux = split(/......./,$line)
    }
}
close(CONFIG_FILE);

大卫·W

查看您的代码，看来您是Perl的新手。该Text::CSV模块是一个不错的解决方案，但不幸的是，它不是标准模块。您需要使用CPAN进行安装。这并不困难，但是可能需要您成为计算机的管理员。

Text :: ParseWords模块是一个标准模块，可以像处理罐头一样处理带引号的单词Text::CSV。

您将需要基本拆分行（我使用该parse_line函数）。第一个参数是,我想分割的行。与split自身不同，parse_line它不会在引用的参数上进行拆分，而是处理反引号。这非常相似Text::CSV。

分割线后，您需要将日期和时间，名称和国家/地区分开。在我的示例中，我展示了两种方法：一种使用split另一种使用匹配的正则表达式。任一个都会起作用。

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);    # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;

while ( my $line = <DATA> ) {
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
            = parse_line ',', 0, $line;
    my ($date, $time) = split /\s+/, $date_time;
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
    say "$date, $time, $country, $name";
}

__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

在您的实际程序中，您将打开文件，并确保已打开该文件。您可以对此进行测试，或use autodie：

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);    # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;
use autodie;

open my $config_file, "<", $file;  # No need for testing thanks to use autodie!

# What you need to do if you don't use autodie
# open my $config_file, "<", $file or die qq(Can't open "$file" for reading);

while ( my $line = <$config_file> ) {
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)  
            = parse_line ',', 0, $line;
    my ($date, $time) = split /\s+/, $date_time;
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
    say "$date, $time, $country, $name";  # Show fields were correctly parsed.
}

看来您想存储数据，我发现您有多个哈希值，我敢打赌，您试图并行进行。看一下如何使用引用来构建更复杂的结构：

my %data;   #Where I'll be storing the data...
$data{$key}->{DATE} = $date;
$data{$key}->{HOUR} = $hour;
$data{$key}->{COUNTRY} = $country;
...

现在，您的所有数据都在中%data。您可以在程序中到处传递它，而不用担心是否更新了每个哈希。

一旦掌握了引用，您就可以编写面向对象的Perl代码。

也可以获得有关Modern Perl的好书。自Perl 5发布以来，Perl编码技术已经发生了很大变化。不幸的是，大多数人从未学习过应该编写Perl的方式，因为他们从周围的旧书中学习，或者从查看以Perl 3和Perl 4错误编写的旧代码中学习（双关语意）。Perl是一种灵活而强大的语言，它使您能够快速生成足够多的绳索来吊死自己。学习良好的编程技术将使您能够编写更复杂，更全面的程序，这些程序实际上更易于阅读和维护。

几乎完整的程序...

这是完整的程序，该程序在特定目录中找到最新文件，然后读取该文件并解析行。

我正在使用-M 文件测试。此文件测试返回文件的最后修改时间，表示为自程序运行以来以天为单位的文件寿命。例如，将返回2.5在2 1/2天前最后修改的文件，而在1天4小时前最后修改的文件将返回1.16666667。您可以使用它来比较各种文件的使用期限。

该程序可以在Perl 5.8.8上运行，而无需安装任何新模块，并且我已经使用自己编写的数据对其进行了测试。

你可以看到我用“open ... or die ...;没有任何问题。你得到一些其他错误？你有没有use strict;和use warnings;你的程序中设置？

#! /usr/bin/env perl
#

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use Text::ParseWords;
use Benchmark;

use constant {
    DATA_FILE_DIR => "temp",
};

#
# Find newest file in the directory
#

opendir my $data_dir, DATA_FILE_DIR
        or die qq(Cannot open directory for reading.);

my $newest_file;
while ( my $file = readdir $data_dir ) { 
    next if $file eq "." or $file eq "..";
    my $full_name = DATA_FILE_DIR . "/" . $file;
    if ( not defined $newest_file
            or -M $full_name < -M $newest_file ) {
        $newest_file = $full_name;
    }
}
print qq(Using file is "$newest_file"\n);
closedir $data_dir;

open my $file, "<", $newest_file
        or die qq(Cannot open file "$newest_file" for reading.);
while ( my $line = <$file> ) {
    # Read in the entire line
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2) 
            = parse_line ',', 0, $line;
    # Split the DATE/TIME field
    my ($date, $time) = split /\s+/, $date_time;

    # Split the Country/Name field
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;

    # Print statement merely shows that these four fields are truly split.
    print "$date, $time, $country, $name\n";
}

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-16

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章