Brainsteam

Weeknote Week 11 2022

Published on March 20, 2022 by James Ravenscroft

Why Weeknote?

One of my favourite Fosstodon people Doug Belshaw publishes a blog post every week about what he got up to during the week which is always interesting to read. Re-visiting your notes on a weekly basis is always a great idea from both a learning standpoint since it helps you cement the things you learned and wrote about in your mind and also from an appreciation and gratitude standpoint since it makes you take stock and look back at the things you did this week, the volume of which may surprise you. Doing this on your public facing blog has the side effect of making you write in a focused way as if someone other than yourself might read it and that will help you make sense of your own writing when you come back to review it in the future. It can help hold you accountable since its much easier to convince yourself not to bother writing a weeknote or to put it off if you are the only one who will ever see it. Finally of course, there’s the possibility that others might read it and find it interesting or learn something.

Painless Explainability for NLP/Text Models with LIME and ELI5

Published on March 14, 2022 by James Ravenscroft

#machine-learning #work #explainability

Introduction

Explainability of machine learning models is a hot topic right now - particularly in deep learning where models are that bit harder to reason about and understand. These models are often called ‘black boxes’ because you put something in, you get something out and you don’t really know how that outcome was achieved. The ability to explain machine learning model’s decisions in terms of the features passed in is both useful from a debugging standpoint (identifying features with weird weights) and with legislation like GDPR’s Right to an Explanation it is becoming important in a commercial setting to be able to explain why models behave a certain way.

Here be Dragons: Deep Learning Reproducibility

Published on January 22, 2022 by James Ravenscroft

#machine-learning #work #phd

A harrowing tale of trying to solve the impossible and failing. Episode 5 in this year’s run at the #100DaysToOffload challenge. See the full series here

Photo by Tim Mossholder from Pexels — Photo by **Tim Mossholder** from **Pexels**

That’s So Random: Randomness in Machine Learning

Training Machine Learning and in particular Deep Learning models generally involves a lot of random number generation. If we’re training a supervised classifier or regressor, we tend to randomly split our annotated data training set from our test set. Also, if you are training a new neural network it is fairly standard practice to randomly initialize the connections between the neurons (the weights) with a random number (here’s why).

goldbergyoni/nodebestpractices: The Node.js best practices list (December 2021)

Published on January 8, 2022 by James Ravenscroft

#software engineering #work #phd

An opinionated guide to Python environments in 2021

Published on April 12, 2021 by James Ravenscroft

#python #devops #work #open-source

A person overwhelmed by boxes by Cottonbro

Note: If you don’t want to read the blah-blah context and history stuff then you can jump to the recommendations

The Problem

The need for virtual python environments becomes fairly obvious early in most Python developers’ careers when they switch between two projects and realise that they have incompatible dependences (e.g. project1 needs scikit-learn-0.21 and project2 needs scikit-learn-0.24). Unlike other mainstream languages like Javascript(Node.js) and Java (with Maven) where dependencies are stored locally to the project, Python dependencies are installed at system or environment level and affect all projects that are using the same environment.

Reproducing 'ancient' experiments with Pytorch inside docker

Published on March 1, 2021 by James Ravenscroft

#machine-learning #python #ai #devops #mlops #work #phd #open source

A beige analog compass by Ylanite Koppens

Introduction

Open machine learning research is undergoing something of a reproducibiltiy crisis. In fairness it’s not usually the authors’ fault - or at least not entirely. We’re a fickle industry and the tools and frameworks were ‘in vogue’ and state of the art a couple of years ago are now obsolete. Furthermore, academics and open source contributors are under no obligation to keep their code up to date. It is often left up to the reproducer to figure out how to breathe life back into older work.

Pickle 5 Madness with MLFlow and Python 3.6/3.7

Published on January 14, 2021 by James Ravenscroft

#machine-learning #python #ai #devops #mlops #work #open source

I recently came across an infuriating problem where an MLFlow python model I had trained on one system using Python 3.6 would not load on another system with an identical version of Python.

The exact problem was that when I ran mlflow models serve -m <url/to/model/in/bucket> the service would crash saying that the model could not be unserialized because ValueError: unsupported pickle protocol: 5.

Serving NLP Models with MLflow

Published on December 29, 2020 by James Ravenscroft

#machine-learning #python #ai #devops #mlops #nlp #spacy #work #open source

MLFlow is a powerful open source MLOps platform with built in framework for serving your trained ML models as REST APIs. The REST framework will load data provided in a JSON or CSV format compatible with pandas and pass this directly into your model. This can be handy when your model is expecting a tabular list of numerical and categorical features. However it is less clear how to serve with models and pipelines that are expecting unstructured text data as their primary input. In this post we will explore how to train and then serve an NLP model using MLFlow, scikit-learn and spacy.

DVC and Backblaze B2 for Reliable & Reproducible Data Science

Published on November 27, 2020 by James Ravenscroft

#data science #devops #machine learning #work

Introduction

When you’re working with large datasets, storing them in git alongside your source code is usually not an optimal solution. Git is famously, not really suited to large files and whilst general purpose solutions exist (Git LFS being perhaps the most famous and widely adopted solution), DVC is a powerful alternative that does not require a dedicated LFS server and can be used directly with a range of cloud storage systems as well as traditional NFS and SFTP-backed filestores all listed out here.

‘Dark’ Recommendation Engines: Algorithmic curation as part of a ‘healthy’ information diet.

Published on September 4, 2020 by James Ravenscroft

#machine-learning #ai #work

In an ever-growing digital landscape filled with more content than a person can consume in their lifetime, recommendation engines are a blessing but can also be a a curse and understanding their strengths and weaknesses is a vital skill as part of a balanced media diet.

If you remember when connecting to the internet involved a squawking modem and images that took 5 minutes to load then you probably discovered your favourite musician after hearing them on the radio, reading about them in NME being told about them by a friend. Likewise you probably discovered your favourite TV show by watching live terrestrial TV, your favourite book by taking a chance at your local library and your favourite movie at a cinema. You only saw the movies that had cool TV ads or rave reviews – you couldn’t afford to take a chance on a dud when one ticket, plus bus fare plus popcorn and a drink cost more than two weeks pocket money.

Content tagged with "Work"

Why Weeknote?

Introduction

That’s So Random: Randomness in Machine Learning

Note: If you don’t want to read the blah-blah context and history stuff then you can jump to the recommendations

The Problem

Introduction

Introduction

In an ever-growing digital landscape filled with more content than a person can consume in their lifetime, recommendation engines are a blessing but can also be a a curse and understanding their strengths and weaknesses is a vital skill as part of a balanced media diet.