使用模式文件和输入文件进行grep过滤

debugcn 发表于 Dev

马马扎夫

我有一个输入文件，看起来像：

$Interesting line
$Interesting line 2
#Also interesting line
Non interesting line - filter out
$another interesting line
Interesting line contains FiRsT pattern
Another non interesting line
Interesting line contains sec"o^nd pattern
#Interesting line

我还有另一个模式文件，其中包含我要过滤的模式（请注意，模式文件可能包含有问题的字符-我想将它们称为简单字符而不是通配符/正则表达式）：

FiRsT
sec"o^nd

我想得到以下结果：

$Interesting line
$Interesting line 2
#Also interesting line
$another interesting line
Interesting line contains FiRsT pattern
Interesting line contains sec"o^nd pattern
#Interesting line

也就是说，以下两行被过滤掉了：

Non interesting line - filter out
Another non interesting line

更确切地说，我希望结果文件中的所有行都包含模式文件的任何字符串，或者以＃或$开头的行（顺序很重要）。

我知道如何从模式文件中过滤字符串：

grep -F -f pattern_file.txt input_file.txt

而且我知道如何过滤以$和＃开头的所有行：

grep '^\$\|^#' input_file.txt

但是我应该怎么做？唯一的方法是为此写一个简短的子脚本，还是我仍然可以使用简单的grep / sed /任何标准的Linux命令？

同样，请记住：

行的顺序很重要，并且必须与原始输入文件的顺序匹配。
模式文件可能包含有问题的字符，我想将它们称为常规字符（而不是通配符/正则表达式）。

编辑：考虑以下情况：

输入文件还包含

Interesting line with ^third pattern

模式文件包含

^third

当然，我希望该行出现在结果文件中。这就是为什么我不能在没有-F标志的情况下引用模式文件，并且不能仅向其中添加^ \ $和^＃行的原因。

迈克尔·维尔斯

您可以使用awk：

NR==FNR { pattern[NR]= $0; count++; next }
/^[$#]/ { print ; next }
{
    for (i = 1; i <= count; i++) {
        if (index($0, pattern[i]) > 0) {
            print; next;
        }
    }
}

或者，您可以处理您的特征码文件并引用所有正则表达式元字符。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。