Content tagged with "Work"

Why Weeknote?

One of my favourite Fosstodon people Doug Belshaw publishes a blog post every week about what he got up to during the week which is always interesting to read. Re-visiting your notes on a weekly basis is always a great idea from both a learning standpoint since it helps you cement the things you learned and wrote about in your mind and also from an appreciation and gratitude standpoint since it makes you take stock and look back at the things you did this week, the volume of which may surprise you. Doing this on your public facing blog has the side effect of making you write in a focused way as if someone other than yourself might read it and that will help you make sense of your own writing when you come back to review it in the future. It can help hold you accountable since its much easier to convince yourself not to bother writing a weeknote or to put it off if you are the only one who will ever see it. Finally of course, there’s the possibility that others might read it and find it interesting or learn something.

Read more...

Introduction

Explainability of machine learning models is a hot topic right now - particularly in deep learning where models are that bit harder to reason about and understand. These models are often called ‘black boxes’ because you put something in, you get something out and you don’t really know how that outcome was achieved. The ability to explain machine learning model’s decisions in terms of the features passed in is both useful from a debugging standpoint (identifying features with weird weights) and with legislation like GDPR’s Right to an Explanation it is becoming important in a commercial setting to be able to explain why models behave a certain way.

Read more...

A harrowing tale of trying to solve the impossible and failing. Episode 5 in this year’s run at the #100DaysToOffload challenge. See the full series here

Photo by Tim Mossholder from Pexels

Photo by Tim Mossholder from Pexels

That’s So Random: Randomness in Machine Learning

Training Machine Learning and in particular Deep Learning models generally involves a lot of random number generation. If we’re training a supervised classifier or regressor, we tend to randomly split our annotated data training set from our test set. Also, if you are training a new neural network it is fairly standard practice to randomly initialize the connections between the neurons (the weights) with a random number (here’s why).

Read more...

A person overwhelmed by boxes by Cottonbro

A person overwhelmed by boxes by Cottonbro

Note: If you don’t want to read the blah-blah context and history stuff then you can jump to the recommendations

The Problem

The need for virtual python environments becomes fairly obvious early in most Python developers’ careers when they switch between two projects and realise that they have incompatible dependences (e.g. project1 needs scikit-learn-0.21 and project2 needs scikit-learn-0.24). Unlike other mainstream languages like Javascript(Node.js) and Java (with Maven) where dependencies are stored locally to the project, Python dependencies are installed at system or environment level and affect all projects that are using the same environment.

Read more...

A beige analog compass by Ylanite Koppens

A beige analog compass by Ylanite Koppens

Introduction

Open machine learning research is undergoing something of a reproducibiltiy crisis. In fairness it’s not usually the authors’ fault - or at least not entirely. We’re a fickle industry and the tools and frameworks were ‘in vogue’ and state of the art a couple of years ago are now obsolete. Furthermore, academics and open source contributors are under no obligation to keep their code up to date. It is often left up to the reproducer to figure out how to breathe life back into older work.

Read more...

A jar of pickles by Ksenia Charnaya

A jar of pickles by Ksenia Charnaya

I recently came across an infuriating problem where an MLFlow python model I had trained on one system using Python 3.6 would not load on another system with an identical version of Python.

The exact problem was that when I ran mlflow models serve -m <url/to/model/in/bucket> the service would crash saying that the model could not be unserialized because ValueError: unsupported pickle protocol: 5.

Read more...

MLFlow is a powerful open source MLOps platform with built in framework for serving your trained ML models as REST APIs. The REST framework will load data provided in a JSON or CSV format compatible with pandas and pass this directly into your model. This can be handy when your model is expecting a tabular list of numerical and categorical features. However it is less clear how to serve with models and pipelines that are expecting unstructured text data as their primary input. In this post we will explore how to train and then serve an NLP model using MLFlow, scikit-learn and spacy.

Read more...

Introduction

When you’re working with large datasets, storing them in git alongside your source code is usually not an optimal solution. Git is famously, not really suited to large files and whilst general purpose solutions exist (Git LFS being perhaps the most famous and widely adopted solution), DVC is a powerful alternative that does not require a dedicated LFS server and can be used directly with a range of cloud storage systems as well as traditional NFS and SFTP-backed filestores all listed out here.

Read more...

In an ever-growing digital landscape filled with more content than a person can consume in their lifetime, recommendation engines are a blessing but can also be a a curse and understanding their strengths and weaknesses is a vital skill as part of a balanced media diet.

If you remember when connecting to the internet involved a squawking modem and images that took 5 minutes to load then you probably discovered your favourite musician after hearing them on the radio, reading about them in NME being told about them by a friend. Likewise you probably discovered your favourite TV show by watching live terrestrial TV, your favourite book by taking a chance at your local library and your favourite movie at a cinema. You only saw the movies that had cool TV ads or rave reviews – you couldn’t afford to take a chance on a dud when one ticket, plus bus fare plus popcorn and a drink cost more than two weeks pocket money.

Read more...