Java regex to split along words, punctuation, and whitespace, and keep all in an array



I am trying to split a sentence into a group of strings. I want to keep all words, punctuation and whitespace in an array.

For example:

"Hello! My name is John Doe."

Would be split into:

["Hello", "!", " ", "My", " ", "name", " ", "is", " ", "John", " ", "Doe"]

I currently have the following line of code breaking my sentence:

String[] fragments = sentence.split("(?<!^)\\b");

However, this is running into an error where it counts a punctuation mark followed by a whitespace as a single string. How do I modify my regex to account for this?

1 Answer: 

You can try the following regular expression:

"Hello! My name is John Doe.".split("(?<=\\b|[^\\p{L}])", 0) 
// ⇒ ["Hello", "!", " ", "My", " ", "name", " ", "is", " ", "John", " ", "Doe", "."]