I need to break down a regular expression into its basic parts. For instance, given the regex [a-d]+[r-z]*
I need to split it into [a-d]+
and [r-z]*
. This is of course a very simple example, and regex syntax can get very complex...
Is there a (relatively) simple way to achieve this, or am I doomed to reverse-engineer a regex parser?
I need this to find out if a given string is a part of matching input for a given regular expression.
You can brute-force it this way:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexSplitter {
private static boolean tryRegex(String regex) {
try {
Pattern.compile(regex);
return true;
} catch(PatternSyntaxException pse) {
return false;
}
}
public static void main(String args[]) {
String input = "[a-d]+[r-z]*";
List<String> results = new ArrayList<>();
int start = 0;
int end = 1;
boolean good = false;
while(end < input.length()) {
String part = input.substring(start, end);
if(!tryRegex(part)) {
if(good) {
good = false;
results.add(input.substring(start, end - 1));
start = end-1;
}
} else {
good = true;
}
++end;
}
if(tryRegex(input))
results.add(input.substring(start,end));
System.out.println(results);
}
}
// Output: [[a-d]+, [r-z]*]
It's hacky and heuristic, but it may work for your purposes.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments