Content tagged with "Topic Model"

As part of my PhD I’m currently interested in topic models that can take into account the dialect of the writing. That is, how can we build a model that can compare topics discussed in different dialectical styles, such as scientific papers versus newspaper articles. If you’re new to the concept of topic modelling then this article can give you a quick primer.

Vanilla LDA

A diagram of how latent variables in LDA model are connected

Vanilla topic models such as Blei’s LDA are great but start to fall down when the wording around one particular concept varies too much. In a scientific paper you might expect to find words like “gastroenteritis”, “stomach” and “virus” whereas in newspapers discussing the same topic you might find “tummy”, “sick” and “bug”.  A vanilla LDA implementation might struggle to understand that these concepts are linked unless the contextual information around the words is similar (e.g. both articles have “uncooked meat” and “symptoms last 24 hours”).

Read more...