As may or may not be evident from the question, I'm pretty new to R and I could do with a bit of help on this.
When creating topic models, I've experimented with LDA and LDAvis - code in (A) and (B) below. LDA in (A) allows me to find the posterior probability of the topics occurring in each document within my corpus, which I have used to run regressions with variables from other datasets. (B), the topic generation approach using LDAvis, generates 'better', more coherent topics than through (A), but I haven't been able to work out how to find the posterior probabilities of the topics occurring in a given document with the LDAvis approach, or whether to discount this as an impossible task.
All advice greatly appreciated.
set.seed(1) require(topicmodels) set.seed(1) P5LDA4 <- LDA(P592dfm, control=list(seed=1), k = 23) set.seed(1) terms(P5LDA4, k =30) #find posterior probability postTopics <- data.frame(posterior(P5LDA4)$topics) postTopics
# MCMC and model tuning parameters: K <- 23 G <- 5000 alpha <- 0.02 eta <- 0.02 # convert to lda format dfmlda <- convert(newdfm, to = "lda") # fit the model library(lda) set.seed(1) t1 <- Sys.time() fit <- lda.collapsed.gibbs.sampler(documents = dfmlda$documents, K = K, vocab = dfmlda$vocab, num.iterations = G, alpha = alpha, eta = eta, initial = NULL, burnin = 0, compute.log.likelihood = TRUE) t2 <- Sys.time() t2 - t1 #Time difference of 3.13337 mins save(fit, file = "./fit.RData") load("./fit.RData") library(LDAvis) set.seed(1) json <- createJSON(phi = t(apply(t(fit$topics) + eta, 2, function(x) x/sum(x))), theta = t(apply(fit$document_sums + alpha, 2, function(x) x/sum(x))), doc.length = ntoken(newdfm), vocab = features(newdfm), term.frequency = colSums(newdfm)) serVis(json, out.dir = "./visColl", open.browser = TRUE)