[^.]+\.(txt|html)
I am learning regex, and am trying to parse this.
[^.] The ^ means "not", and the dot is a wildcard that means any character, so this means find a match with "not any character"? I still don't understand this. Can anyone explain?
The plus is a Kleene Plus which means "1 or more". So now it's "one or more" "not any character".
I get \.
, it means a period.
(txt|html) means match with a txt file or html file. I think I understand everything after the plus sign. What I don't understand is why it doesn't look something the DOS equivalent where I can just do this: *.txt or *.(txt|html) where * means everything that ends in the file extension .txt or .html?
Is [^.] the equivalent of * in DOS?
The dot (.
) has no special meaning when it's inside a character class, and doesn't require to be escaped.
[^.]
means "any character that is not a literal .
character". [^.]+
matches one or more occurrences of any character that is not a dot.
From regular-expressions.info:
In most regex flavors, the only special characters or meta-characters inside a character class are the closing bracket (
]
), the backslash (\
), the caret (^
), and the hyphen (-
). The usual meta-characters are normal characters inside a character class, and do not need to be escaped by a backslash. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments