regex - split sentences in R where the email id's or decimal numbers need not be splitted


Keywords:r 


Question: 

I want to split the paragraph into sentences by full stop or period. But while doing this the decimal numbers, email id's are also getting split into different dataframes. can anyone help me to split the data into sentences.

Eg:

aa = "For Important Disclosure information, please visit our website at 0.5%  https://javatar.bluematrix.com/sellside/Disclosures.action or call 1.888.JEFFERIES. An organization. 0.5% have an analysis."

this should be split into

  1. For Important Disclosure information, please visit our website at 0.5% https://javatar.bluematrix.com/sellside/Disclosures.action or call 1.888.JEFFERIES.
  2. An organization.
  3. 0.5% have an analysis

code:

sentences = as.matrix(unlist(strsplit(aa,"\\.")))

1 Answer: 

This looks like it is working:

strsplit(aa, '. ', fixed = TRUE)
#[[1]]
#[1] "For Important Disclosure information, please visit our website at 0.5% https://javatar.bluematrix.com/sellside/Disclosures.action or call 1.888.JEFFERIES"
#[2] "An organization"                                                                                                                                          
#[3] "0.5% have an analysis."