Java PatternSyntaxException: Unmatched closing '('

RazorAlliance192

I need to remove all the URLs found in Twitter messages. I have a file with around 200,000 such messages so speed is crucial! To do this I use Java as a programming language, here is an example of my code:

public String performStrip(){

    String tweet = this.getRawTweet();
    String urlPattern = "((https?|http)://(bit\\.ly|t\\.co|lnkd\\.in|tcrn\\.ch)\\S*)\\b";

    Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(tweet);

    int i = 0;

    while (m.find()) {
        tweet = tweet.replaceAll(m.group(i),"").trim();
        i++;
    }

    return tweet;
}

This works fine in following cases:

http://t.co/nhWp9hldEH        -> (empty string)
http://t.co/nhWp9hldEH"       -> "
http://t.co/nhWp9hldEH)aaa"   -> aaa"
aaa(http://t.co/nhWp9hldEH"   -> aaa("
aaa(http://t.co/nhWp9hldEH)"  -> aaa()"

However, when I get to a case as follows:

http://t.co/nhWp9hldEH)aaa"

I get an error

java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 21

http://t.co/nhWp9hldEH)aa

at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.compile(Pattern.java:1669)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.lang.String.replaceAll(String.java:2210)
at com.anturo.preprocess.url.UrlStripper.performStrip(UrlStripper.java:47)
at com.anturo.preprocess.testing.ReadIn.<init>(ReadIn.java:35)
at com.anturo.preprocess.testing.Main.main(Main.java:6)

I already looked into multiple similar questions regarding this error, however none have worked so far... Hoping someone can help me out here.

fge

The problem is that you may have regex special characters in a URL, as you can see.

Short solution: use Pattern.quote(). Your code would then be:

tweet = tweet.replaceAll(Pattern.quote(m.group(i)),"").trim();

Note: only available since JDK 1.5, but you do use this or better, right?

Another solution is to simply use .replace():

tweet = tweet.replace(m.group(i), "").trim();

Unlike what its name suggests with regards to .replaceAll(), .replace() does replace all occurrences; it is simply that it doesn't take a regex as a replacement string. See also .replaceFirst().

Last but not least, you seem to be misusing .group()! Your loop should be:

while (m.find())
    tweet = tweet.replace(m.group(), "").trim();

No need for the i variable here; m.group(i) will, for one match, return what is matched by capturing group i in your regex.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Exception in thread "main" java.util.regex.PatternSyntaxException: Unmatched closing

From Dev

java.util.regex.PatternSyntaxException: Unmatched closing ')' : during string.split operation

From Dev

Javascript regex to remove unmatched closing HTML tags?

From Dev

java.util.regex.PatternSyntaxException

From Dev

java.util.regex.PatternSyntaxException with UCanAccess

From Dev

java.util.regex.PatternSyntaxException Android

From Dev

Regex to find and fix unmatched xml closing tags in notepad++

From Dev

Java: PatternSyntaxException thrown with regex .*-\\d+{.*}\\d+-.*

From Dev

Caused by: java.util.regex.PatternSyntaxException: Illegal repetition

From Dev

PatternSyntaxException: while using String.ReplaceAll function in java?

From Dev

Closing MongoDB Java Connection

From Dev

Java: JDialog Closing Issue

From Dev

Java - closing the UDP socket

From Dev

Fix closing tags with Java

From Dev

Java server not closing properly?

From Dev

Closing Window (Java)

From Dev

Java: JDialog Closing Issue

From Dev

Closing Reader/Stream in Java

From Dev

java.util.regex.PatternSyntaxException: Unclosed character class near index 0

From Dev

java.util.regex.PatternSyntaxException: Unclosed character class near index 12 \\b]([^.(|[]+)

From Dev

java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0 +

From Dev

java.util.regex.PatternSyntaxException: Unclosed character class near index 44

From Dev

Java Regex - To parse function name and argument name - throws PatternSyntaxException: Unclosed Character

From Dev

Facing exception on closing client in Java

From Dev

How to prevent closing java applications?

From Dev

Java 8 - closing stream on Exception?

From Dev

Closing MySQL Database Connections in java

From Dev

Facing exception on closing client in Java

From Dev

what is the purpose of closing the scanner in java?

Related Related

  1. 1

    Exception in thread "main" java.util.regex.PatternSyntaxException: Unmatched closing

  2. 2

    java.util.regex.PatternSyntaxException: Unmatched closing ')' : during string.split operation

  3. 3

    Javascript regex to remove unmatched closing HTML tags?

  4. 4

    java.util.regex.PatternSyntaxException

  5. 5

    java.util.regex.PatternSyntaxException with UCanAccess

  6. 6

    java.util.regex.PatternSyntaxException Android

  7. 7

    Regex to find and fix unmatched xml closing tags in notepad++

  8. 8

    Java: PatternSyntaxException thrown with regex .*-\\d+{.*}\\d+-.*

  9. 9

    Caused by: java.util.regex.PatternSyntaxException: Illegal repetition

  10. 10

    PatternSyntaxException: while using String.ReplaceAll function in java?

  11. 11

    Closing MongoDB Java Connection

  12. 12

    Java: JDialog Closing Issue

  13. 13

    Java - closing the UDP socket

  14. 14

    Fix closing tags with Java

  15. 15

    Java server not closing properly?

  16. 16

    Closing Window (Java)

  17. 17

    Java: JDialog Closing Issue

  18. 18

    Closing Reader/Stream in Java

  19. 19

    java.util.regex.PatternSyntaxException: Unclosed character class near index 0

  20. 20

    java.util.regex.PatternSyntaxException: Unclosed character class near index 12 \\b]([^.(|[]+)

  21. 21

    java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0 +

  22. 22

    java.util.regex.PatternSyntaxException: Unclosed character class near index 44

  23. 23

    Java Regex - To parse function name and argument name - throws PatternSyntaxException: Unclosed Character

  24. 24

    Facing exception on closing client in Java

  25. 25

    How to prevent closing java applications?

  26. 26

    Java 8 - closing stream on Exception?

  27. 27

    Closing MySQL Database Connections in java

  28. 28

    Facing exception on closing client in Java

  29. 29

    what is the purpose of closing the scanner in java?

HotTag

Archive