I want to find patters and extract useful information from a large amount of survey data. The data is sorted in an .xlsx spreadsheet with 4 columns corresponding to particular questions, with each row filled with a text response from the respondent.
How can I use python and openpyxl to extract patterns from the data, such as frequency of words or phrases, connections between answers across the four questions, or anything else I should look for?
I have limited experience in data/text mining, so if there is some documentation, useful tutorials, or another StackOverflow question I should look at, please let me know. I did a fair amount of searching here and elsewhere, but haven't found what I'm looking for.
So far I have taken a shot at word frequency based on the survey question, but it has proved difficult to navigate the openpyxl documentation to do something like this. Is there an easy way to do this in python?
Sample array [600x4]:
[['this is an example of an answer to question 1 by respondent 1', 'answer to Q2 by R1', 'ans to Q3 by R1', 'ans to Q4 by R1'] ['ans to Q1 by R2', 'ans to Q2 by R2', 'ans to Q3 by R2', 'ans to Q4 by R2'] [etc, etc, etc, etc...]]