Using R I would like to take a single CSV and pull out the most common two and three word phrases. I've been searching Google and Stackoverflow and could not find a simple way to do this.
I know how to read a CSV into R but I have not found out how to extract the data into the appropriate datatype and perform operations on to get what I am looking for.
- Remove all non alpha numeric text from the CSV
- Replace words using a synonym list
- Remove words with no meaning (at, the, etc)
- Get a count of the common phrases for both two word phrases and three word phrases
- Make all text lowercase
Also, what data types are best suited for this type of analysis? dataframe? tm? corpus? etc?
My_SRs <- read.csv("C:/example_folder/username/Documents/my_data.csv")
Thanks in advance!