我试图编写一个扰流器识别系统,以便将字符串中的任何扰流器替换为指定的扰流器字符。
我想匹配一个用方括号括起来的字符串,这样方括号内的内容就是捕获组1,包括周围括号在内的整个字符串就是匹配项。
我目前正在使用\[(.*?]*)\]
中,这个答案找到了表达的轻微修饰这里,我也想嵌套的方括号是捕获组1的一部分。
该表达式的问题在于,尽管它可以工作并匹配以下内容:
Jim ate a [sandwich]
匹配[sandwich]
与sandwich
作为第1组Jim ate a [sandwich with [pickles and onions]]
匹配[sandwich with [pickles and onions]]
与sandwich with [pickles and onions]
作为第1组[[[[]
匹配[[[[]
与[[[
作为第1组[]]]]
匹配[]]]]
与]]]
作为第1组但是,如果我要匹配以下内容,它将无法正常工作:
Jim ate a [sandwich with [pickles] and [onions]]
都匹配:
[sandwich with [pickles]
与sandwich with [pickles
第1组一起[onions]]
与onions]
第1组一起什么表情,我应该使用这样它匹配[sandwich with [pickles] and [onions]]
与sandwich with [pickles] and [onions]
作为第1组?
编辑:
由于似乎无法使用正则表达式在Java中实现此功能,是否有替代解决方案?
编辑2:
我还希望能够按找到的每个匹配项来拆分字符串,因此,由于String.split(regex)
方便,正则表达式的替代方案将更难实现。这是一个例子:
Jim ate a [sandwich] with [pickles] and [dried [onions]]
匹配所有:
[sandwich]
与sandwich
第1组一起[pickles]
与pickles
第1组一起[dried [onions]]
与dried [onions]
第1组一起拆分句子应如下所示:
Jim ate a
with
and
此解决方案将省略空白或仅空白子字符串
public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) {
List<String> subTreeList = new ArrayList<String>();
int level = 0;
int lastCloseBracket= 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == markStart) {
level++;
if (level == 1 && i != 0 && i!=lastCloseBracket &&
!s.substring(lastCloseBracket, i).trim().isEmpty()) {
subTreeList.add(s.substring(lastCloseBracket, i).trim());
}
}
} else if (c == markEnd) {
if (level > 0) {
level--;
lastCloseBracket = i+1;
}
}
}
if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) {
subTreeList.add(s.substring(lastCloseBracket).trim());
}
return subTreeList;
}
然后,将其用作
String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here";
List<String> between_balanced = getStrsBetweenBalancedSubstrings(input, '[', ']');
System.out.println("Result: " + between_balanced);
// => Result: [Jim ate a, with, and, and ], and more here]
您还可以提取平衡括号内的所有子字符串,然后将其拆分:
String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]";
List<String> balanced = getBalancedSubstrings(input, '[', ']', true);
System.out.println("Balanced ones: " + balanced);
List<String> rx_split = new ArrayList<String>();
for (String item : balanced) {
rx_split.add("\\s*" + Pattern.quote(item) + "\\s*");
}
String rx = String.join("|", rx_split);
System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));
并且此函数将找到所有[]
平衡的子字符串:
public static List<String> getBalancedSubstrings(String s, Character markStart,
Character markEnd, Boolean includeMarkers) {
List<String> subTreeList = new ArrayList<String>();
int level = 0;
int lastOpenBracket = -1;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == markStart) {
level++;
if (level == 1) {
lastOpenBracket = (includeMarkers ? i : i + 1);
}
}
else if (c == markEnd) {
if (level == 1) {
subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i)));
}
if (level > 0) level--;
}
}
return subTreeList;
}
代码执行的结果:
Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]']
In-betweens: ['Jim ate a', 'with', 'and', 'and ]']
鸣谢:的getBalancedSubstrings
基础是peter.murray.rust的答案,即如何在Java正则表达式中拆分此“树状”字符串?发布。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句