text mining - Error using "TermDocumentMatrix" and "Dist" functions in R


Keywords:r 


Question: 

I have been trying to replicate the example here: but I have had some problems along the way.

Everything worked fine until here:

docsTDM <- TermDocumentMatrix(docs8)

Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character"
In addition: Warning message:
In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code

So I was able to fix that error modifying this previous step by changing this:

docs8 <- tm_map(docs7, tolower)

To this:

docs8 <- tm_map(docs7, content_transformer(tolower))

But then I got in trouble again with:

docsdissim <- dissimilarity(docsTDM, method = "cosine")

Error: could not find function "dissimilarity"

Then I learned that the "dissimilarity" function was replaced by the dist function, so I did:

docsdissim <- dist(docsTDM, method = "cosine")

Error in crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)) : non-conformable arrays

And there is where I'm stuck.

By the way, my R version is :

R version 3.2.2 (2015-08-14) running on CentOS 7


1 Answer: 

change

docsdissim <- proxy::dist(docsTDM, method = "cosine")

to

docsdissim <- dist(as.matrix(docsTDM), method = "cosine")

dist requires as input a numeric matrix, data frame or "dist" object and event though a termdocumentmatrix is a matrix, it needs to be transformed here.