#!/usr/bin/env python
# coding: utf-8

# ## Demonstration for the `u_mass` topic coherence using topic coherence pipeline

# In[1]:


import numpy as np
import logging
import pyLDAvis.gensim
import json
import warnings
warnings.filterwarnings('ignore')  # To ignore all warnings that arise here to enhance clarity

from gensim.models.coherencemodel import CoherenceModel
from gensim.models.ldamodel import LdaModel
from gensim.corpora.dictionary import Dictionary
from numpy import array


# ### Set up logging

# In[2]:


logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.debug("test")


# ### Set up corpus

# As stated in table 2 from [this](http://www.cs.bham.ac.uk/~pxt/IDA/lsa_ind.pdf) paper, this corpus essentially has two classes of documents. First five are about human-computer interaction and the other four are about graphs. Let's see how our LDA models interpret them.

# In[3]:


texts = [['human', 'interface', 'computer'],
         ['survey', 'user', 'computer', 'system', 'response', 'time'],
         ['eps', 'user', 'interface', 'system'],
         ['system', 'human', 'system', 'eps'],
         ['user', 'response', 'time'],
         ['trees'],
         ['graph', 'trees'],
         ['graph', 'minors', 'trees'],
         ['graph', 'minors', 'survey']]


# In[4]:


dictionary = Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]


# ### Set up two topic models

# We'll be setting up two different LDA Topic models. A good one and bad one. To build a "good" topic model, we'll simply train it using more iterations than the bad one. Therefore the `u_mass` coherence should in theory be better for the good model than the bad one since it would be producing more "human-interpretable" topics.

# In[5]:


goodLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=50, num_topics=2)
badLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=1, num_topics=2)


# ### Using U_Mass Coherence

# In[14]:


goodcm = CoherenceModel(model=goodLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')


# In[15]:


badcm = CoherenceModel(model=badLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')


# ### View the pipeline parameters for one coherence model

# Following are the pipeline parameters for `u_mass` coherence. By pipeline parameters, we mean the functions being used to calculate segmentation, probability estimation, confirmation measure and aggregation as shown in figure 1 in [this](http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf) paper.

# In[16]:


print goodcm


# ### Visualize topic models

# In[17]:


pyLDAvis.enable_notebook()


# In[18]:


pyLDAvis.gensim.prepare(goodLdaModel, corpus, dictionary)


# In[19]:


pyLDAvis.gensim.prepare(badLdaModel, corpus, dictionary)


# In[20]:


print goodcm.get_coherence()


# In[21]:


print badcm.get_coherence()


# ### Using C_V coherence

# In[25]:


goodcm = CoherenceModel(model=goodLdaModel, texts=texts, dictionary=dictionary, coherence='c_v')


# In[26]:


badcm = CoherenceModel(model=badLdaModel, texts=texts, dictionary=dictionary, coherence='c_v')


# ### Pipeline parameters for C_V coherence

# In[27]:


print goodcm


# ### Print coherence values

# In[28]:


print goodcm.get_coherence()


# In[29]:


print badcm.get_coherence()


# ### Conclusion

# Hence as we can see, the `u_mass` and `c_v` coherence for the good LDA model is much more (better) than that for the bad LDA model. This is because, simply, the good LDA model usually comes up with better topics that are more human interpretable.
# For the first topic, the goodLdaModel rightly puts emphasis on "graph", "trees" and "user" with reference to the second class of documents.
# For the second topic, it puts emphasis on words such as "system", "eps", "interface" and "human" which signify human-computer interaction.
# The badLdaModel however fails to decipher between these two topics and comes up with topics which are mostly both graph based but are not clear to a human. The `u_mass` and `c_v` topic coherences capture this wonderfully by giving the interpretability of these topics a number as we can see above. Hence this coherence measure can be used to compare difference topic models based on their human-interpretability.