machine learning - weka 3.7 explorer cannot classify text

Keywords:machine  learning 


I am trying to do text classification using weka 3.7 explorer. I converted 2 text files( separated into two dir class1 and class2) into arff using text loader. Before doing so, I standardized the case to lower. Now when I load the file into weka and apply filter stringtowordvector (such as stopwords,usewordcount, usestoplist, stemmer - snowballstemmer) I do not see any change in my list of variables . All the variables (words ) are given as 1 or 0 against each class.

Please help me.

Here is my filter command

weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -C -N 0 -S -stemmer weka.core.stemmers.SnowballStemmer -M 1 -tokenizer "weka.core.tokenizers.WordTokenizer -delimiters \" \r\n\t.,;:\\'\\"()?!\""

1 Answer: 

That happend to me when I wanted to read from .csv and use StringToWord vector.

My problem was, that the text attribute was of type nominal and not String. I used the class "NominalToString", used it to changed values to String, and then it worked.