Tokenizing special characters in a string

Crown43

I am working on some code and I have ran into issues regarding splitting up certain characters in a string. When given a string below, I can separate it into separate tokens:

String line = "hello world ; how are you ;"

such as hello, world, and ;

But when the code looks like:

String line2 = "hello world; how are you;"

I create tokens such as world; and you; when in reality I want the semicolon to be its own token. Thank you in advance for the help

Alex Rudenko

It is possible to split the second line using word boundary and remove blank lines using filter:

String line2 = "hello world; how are you;";

String[] arr = Arrays.stream(line2.split("\\b"))
      .filter(s -> !s.matches("\\s+"))
      .toArray(String[]::new);

System.out.println(Arrays.toString(arr));

Output:

[hello, world, ; , how, are, you, ;]

Another option could be to use matching substrings instead of splitting by delimiter. The matching regular expression can be:
\w+|\S+ - at least one word character [0-9A-Za-z_] OR at least one non-space character:

String[] arr2 = Pattern.compile("\\w+|\\S+")
                      .matcher(line2)
                      .results()
                      .map(mr -> mr.group(0))
                      .toArray(String[]::new);

The result is the same.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Special characters in a string [JAVA]

From Dev

String is adding special characters

From Dev

Split string by special characters '\.'

From Dev

Check for special characters in string

From Dev

Special Characters in string literals

From Dev

Count special characters in a string

From Dev

Replaceable string with special characters

From Dev

Replacing special characters in a string

From Dev

vim: replace a string with special characters with string including special characters

From Dev

sed command to replace a string with special characters with another string with special characters

From Java

Deserializing JSON with special characters into a string

From Dev

Removing special Characters from string

From Dev

Find any special characters in a string

From Dev

How to echo string with special characters?

From Dev

replace special characters in a string python

From Dev

Special Characters inside a string in REGEX

From Dev

Removing Special and Invalid characters in a String

From Dev

Replace string spaces with special characters

From Dev

Deserializing JSON with special characters into a string

From Dev

Removing special characters from a string

From Dev

Removing special Characters from string

From Dev

Convert a string to unicode with special characters

From Dev

Split string with special characters in Ruby

From Dev

Remove string that has special characters

From Dev

Replace string spaces with special characters

From Dev

Removing special characters (¡) from a string

From Dev

String operations not working with special characters

From Dev

deleting special characters from a string

From Dev

search for a string with special characters linux

Related Related

HotTag

Archive