Content tagged with "Open Source"

(Cover image “Unlocked” by Sean Hobson)

If you’re an academic or you’ve got an interest in reading scientific papers, you’ve probably run into paywalls that demand tens or even hundreds of £ just to read a scientific paper. It’s ok if you’re affiliated with a university that has access to that journal but it can sometimes be luck of the draw as to whether your institute has access and even if they do, sometimes the SAML login processes don’t work and you still can’t see the paper. Thankfully, the guys at Unpaywall (actually built by Impact Story) have been doing a fantastic job of making open access papers much more easily available to interested academics in the browser. If you end up at a publisher paywall and Unpaywall know about a legitimate free copy of the paper you’re trying to read, they’ll link you straight to it for direct download. Problem solved.


As part of my PhD I’m currently interested in topic models that can take into account the dialect of the writing. That is, how can we build a model that can compare topics discussed in different dialectical styles, such as scientific papers versus newspaper articles. If you’re new to the concept of topic modelling then this article can give you a quick primer.

Vanilla LDA

A diagram of how latent variables in LDA model are connected

Vanilla topic models such as Blei’s LDA are great but start to fall down when the wording around one particular concept varies too much. In a scientific paper you might expect to find words like “gastroenteritis”, “stomach” and “virus” whereas in newspapers discussing the same topic you might find “tummy”, “sick” and “bug”.  A vanilla LDA implementation might struggle to understand that these concepts are linked unless the contextual information around the words is similar (e.g. both articles have “uncooked meat” and “symptoms last 24 hours”).


I’ve written a small command line application for tracking my time on my PhD and other projects. We use Harvest at Filament which is great if you’ve got a huge team and want the complexity (and of course license charges) of an online cloud solution for time tracking.

If, like me, you’re just interested to see how much time you are spending on your different projects and you don’t have any requirement for fancy web interfaces or client billing, then timetrack might be for you. For me personally, I was wondering how much of my week is spent on my PhD as opposed to Filament client work. I know its a fair amount but I want some clear cut numbers.