Backward capture group concatenated with forward capture group

user4002542

I think the title says it all. I'm trying to get groups and concatenate them together.

I have this text:

GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48

And I want this output:

IDENTIFIER 10.802.123/3843-48

So I want to explicitly say, I want to capture one group before this word and after, then concatenate both, only using regex. Is this possible?

I can already extract the 48 like this:

var text = GPX 10.802.123/3843­ 1 -­ IDENTIFIER 48
var reg = new RegExp('IDENTIFIER' + '.*?(\\d\\S*)', 'i');
var match = reg.exec(text);

Output:

48

Can it be done?

I'm offering 200 points.

Michael Laszlo

You must precisely define the groups that you want to extract before and after the word. If you define the group before the word as four or more non-whitespace characters, and the group after the word as one or more non-whitespace characters, you can use the following regular expression.

var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'i');
var groups = re.exec(text);
if (groups !== null) {
   var result = groups[1] + groups[2];
}

Let me break down the regular expression. Note that we have to escape the backslashes because we're writing a regular expression inside a string.

  • (\\S{4,}) captures a group of four or more non-whitespace characters
  • \\s+ matches one or more whitespace characters
  • (?: indicates the start of a non-capturing group
  • \\S{1,3} matches one to three non-whitespace characters
  • \\s+ matches one or more whitespace characters
  • )*? makes the non-capturing group match zero or more times, as few times as possible
  • word matches whatever was in the variable word when the regular expression was compiled
  • .*? matches any character zero or more times, as few times as possible
  • (\\S+) captures one or more non-whitespace characters
  • the 'i' flag makes this a case-insensitive regular expression

Observe that our use of the ? modifier allows us to capture the nearest groups before and after the word.

You can match the regular expression globally in the text by adding the g flag. The snippet below demonstrates how to extract all matches.

function forward_and_backward(word, text) {
  var re = new RegExp('(\\S{4,})\\s+(?:\\S{1,3}\\s+)*?' + word + '.*?(\\S+)', 'ig');
  // Find all matches and make an array of results.
  var results = [];
  while (true) {
    var groups = re.exec(text);
    if (groups === null) {
      return results;
    }
    var result = groups[1] + groups[2];
    results.push(result);
  }
}

var sampleText = "  GPX 10.802.123/3843- 1 -- IDENTIFIER 48   A BC 444.2345.1.1/99x 28 - - Identifier 580 X Y Z 9.22.16.1043/73+ 0  ***  identifier 6800";

results = forward_and_backward('IDENTIFIER', sampleText);
for (var i = 0; i < results.length; ++i) { 
  document.write('result ' + i + ': "' + results[i] + '"<br><br>');
}
body {
  font-family: monospace;
}

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related