I would like to split strings on anything not a digit. In this particular case the strings were dates and times read in from an external
.csv file and are not currently in
Ideally I would like to split the strings using
regex, but if there is a simpler way to convert them to six columns of numbers using a
time function that would be of interest as well.
I have already succeeded in creating a
regex that splits the strings into six columns, but this
regex is not general.
Here are the data:
my.data <- read.csv(text = ' Date_Time 18/05/2011 07:32:40 19/05/2011 13:26:02 19/05/2011 13:32:47 19/05/2011 13:45:24 19/05/2011 14:57:27 19/05/2011 15:03:18 ', header=TRUE, stringsAsFactors = FALSE, na.strings = 'NA', strip.white = TRUE)
Here is a
regex statement that splits the strings into six columns:
my.date.time <- data.frame(do.call(rbind, strsplit(my.data$Date_Time,"[/|:|[:space:]]+") ))
The above statement is not general. Here is an unsuccessful attempt at making the
regex general by specifying a split on anything that is not a digit:
data.frame(do.call(rbind, strsplit(my.data$Date_Time,"[^\\d]+") ))
After I split the strings into six columns I still need what seems like an excessive number of statements to convert the columns into numeric format:
colnames(my.date.time) <- c('my.day', 'my.month', 'my.year', 'my.hour', 'my.minute', 'my.second') revised.data <- data.frame(my.data, my.date.time, stringsAsFactors = FALSE) revised.data$my.day <- as.numeric(as.character(revised.data$my.day)) revised.data$my.month <- as.numeric(as.character(revised.data$my.month)) revised.data$my.year <- as.numeric(as.character(revised.data$my.year)) revised.data$my.hour <- as.numeric(as.character(revised.data$my.hour)) revised.data$my.minute <- as.numeric(as.character(revised.data$my.minute)) revised.data$my.second <- as.numeric(as.character(revised.data$my.second)) revised.data str(revised.data)
Thank you for any assistance in generalizing the above
regex (or streamlining the procedure using
time functions). The
apply function probably can eliminate most of the
as.numeric(as.character) statements, although that is a relatively minor issue.