Content tagged with "Machine Learning"

As part of my PhD I’m currently interested in topic models that can take into account the dialect of the writing. That is, how can we build a model that can compare topics discussed in different dialectical styles, such as scientific papers versus newspaper articles. If you’re new to the concept of topic modelling then this article can give you a quick primer.

Vanilla LDA

A diagram of how latent variables in LDA model are connected

Vanilla topic models such as Blei’s LDA are great but start to fall down when the wording around one particular concept varies too much. In a scientific paper you might expect to find words like “gastroenteritis”, “stomach” and “virus” whereas in newspapers discussing the same topic you might find “tummy”, “sick” and “bug”.  A vanilla LDA implementation might struggle to understand that these concepts are linked unless the contextual information around the words is similar (e.g. both articles have “uncooked meat” and “symptoms last 24 hours”).

Read more...

Thomas Hobbes, perhaps most famous for his thinking on western politics, was also thinking about how the human mind “computes things” 500 years ago.

A recent opinion piece I read on Wired called for us to stop labelling our current specific machine learning models AI because they are not intelligent. I respectfully disagree.

Read more...

A testament to marketers around the world is the myth that their AI platform X, Y or Z can solve all your problems with no effort. Perhaps it is this, combined with developers and data scientists often being hidden out of sight and out of mind that leads people to think this way.

Unfortunately, the truth of the matter is that ML and AI involve blood sweat and tears – especially if you are building things from scratch rather than using APIs. If you are using third party APIs there are still challenges. The biggest players in the API space also have large pools of money. Pools of money that can be spent on marketing literature to convince you that their product will solve all your problems with no effort required. I think this is dishonest and is one of the reasons I have so many conversations like the one below.

Read more...

EDIT: Hello readers, these articles are now 4 years old and many of the Watson services and APIs have moved or been changed. The concepts discussed in these articles are still relevant but I am working on 2nd editions of them.

Last time we discussed some good practices for collecting data and then splitting it into test and train in order to create a ground truth for your machine learning system. We then talked about calculating accuracy using test and blind data sets.

Read more...

EDIT: Hello readers, these articles are now 4 years old and many of the Watson services and APIs have moved or been changed. The concepts discussed in these articles are still relevant but I am working on 2nd editions of them.


This article has a slant towards the IBM Watson Developer Cloud Services but the principles and rules of thumb expressed here are applicable to most cognitive/machine learning problems.

Introduction

imagebot-com-2012042714194724316-800pxQuality assurance is arguably one of the most important parts of the software development lifecycle. In order to release a product that is production ready, it must be put under, and pass, a number of tests – these include unit testing, boundary testing, stress testing and other practices that many software testers are no doubt familiar with. The ways in which traditional software are relatively clear.In a normal system, developers write deterministic functions, that is – if you put an input parameter in, unless there is a bug, you will always get the same output back. This principal makes it.. well not easy… but less difficult to write good test scripts and know that there is a bug or regression in your system if these scripts get a different answer back than usual.

Read more...