As everyone else seems to at this time of year, I thought I would write a quick post about how my year’s gone. I will follow up with some ambitions for 2022 tomorrow.

πŸ’’ Getting Married

We got married

We got married

My biggest personal achievement this year was getting married. My wife and I have been together since 2015 and we’d set our sights on a big traditional wedding. Due to COVID we realised that was unlikely to happen any time soon and it also made us reprioritise what we wanted to spend money on - big weddings are notoriously expensive. Instead, we opted for a small family wedding in Winchester which was a lovely experience and meant we got to interact with all of our guests.

πŸ“” Publishing at EACL

In January I found out that my paper CD2CR: Co-reference Resolution Across Documents and Domains had been accepted at EACL2021.

Co-reference resolution is basically knowing that the he in “James is an IT professional. He lives in England” refers to James. The whole premise of my work was that it allows you to resolve co-references between different types of documents. For example if a news article says “new species of dinosaur discovered” and it links to a scientific paper that says “we discovered Triceratops horridus in a fossil on the coast of Dorset” then the task would be to know that “dinosaur” and “Triceratops horridus” refer to the same thing.

This work has applications in fact checking and understanding when news articles paraphrase scientific work which could change the meaning.

βš—οΈπŸͺΆ Revitalising SAPIENTA & Partridge

SAPIENTA was a project by my PhD superviser Maria Liakata which uses machine learning to identify different sections in a scientific paper (e.g. background, methodology, objectives, conclusions). I built my undergraduate degree project Partridge on top of SAPIENTA. It was a sort of prototype Semantic Scholar that makes scientific papers searchable via their sections (technical name: Core Scientific Concepts) as identified via SAPIENTA.

Earlier this year I took some time to get SAPIENTA and Partridge running again on a cheap VPS over at OVH. As part of this work I took some time to re-write the code that was previously written in Python 2 in Python 3 compatible syntax and modernised some of the processing pipelines (I replaced my home-grown XML-RPC-based background workers with Dramatiq ). I also created a new simplified command-line interface for using SAPIENTA locally and you can also run the whole stack locally via a docker image which is probably overkill for one or two papers but worthwhile for a large collection. SAPIENTA is available as an API here

I’ve also rebuilt and modernised the backend of Partridge (although the frontend could do with some love) - an instance is running here.

πŸ† Winning Best KTP Award

In September, my colleague Cynthia won the Best KTP Award for her collaboration with the University of Essex on CIELO - a tool that tries to train the best machine learning model via parameter optimisation or as she aptly writes - it’s like trying to bake the perfect cake. As Cyn’s team leader & manager I was excited to go along with her to the KTP Event at Essex in september and share in the glory but she did all of the hard work and rightly deserves the lion’s share of the credit.

πŸ€– MLFlow Adoption & Python Environment Standards

At work I led the adoption of MLFlow for storing all of our machine learning experiments and results. This was a huge win in terms of productivity, reproducibility and transparency for the data science team as it means that we always know which models were trained, when, by whom, with which data, where that data is, what parameters were used and what performance was achieved. I wrote a post about some of the challenges of using MLFlow with NLP models earlier in the year.

We’ve also adopted DVC for tracking large data files (i.e. training data sets) without committing the data itself to git. This means that we know exactly which data was used for running a given script/model but that data is not clogging up our git repositories (which slows down checking projects out), it is secure (even if you have access to our git server, you also need credentials to access the data bucket) and access to the data is auditable in a pinch (we can use S3 buckets with paranoid logging). I also wrote a little about using DVC with backblaze which is something I do for personal projects and my PHD work at the end of last year. I’ve started using DVC for tracking and reproducing script runs as well but I’ve still got to write that up into a blog post and some guidelines for my team.

I also formalised some guidelines on best practices for Python development within the data science team at work. Python dependency management can be a real PITA. I’ve been doing Python dev since 2005 and things have really come on leaps and bounds in the last few years with the introduction of tools like Poetry and pipenv. Earlier in the year I published some of my thoughts on how best to handle python environments and dependencies that we’ve now adopted within Filament.

πŸŒ³β™»οΈ Environmental Efforts

I’ve been putting a lot more conscious effort into environmental stuff this year.

  • Firstly I try to reduce what we consume by buying less stuff where possible and buying “eco-friendly” where possible. I’ve been using our local refill and eco shop which opened this year for store cupboard staples and cleaning products. If you’re in South Hampshire/Solent area I can’t rate Nina and her shop highly enough.
  • Our local council only collects cardboard and some types of plastic curbside but I’ve found local bins for different types of plastic in entrances to supermarkets now take all soft plastics including crisp packets and cat food pouches so I manually take them when I need to nip in to town for something.
  • I try to be mindful about replacing/upgrading stuff - do I really need to do it or is what I have “good enough” already? I recently and reluctantly replaced my Pixel 3A because I was finding it sluggish and I didn’t want to root/re-image it and endure lots of headaches with banking apps etc. My mum’s had my 3A off me, factory reset it and is using it as her main phone so it won’t end up as e-waste just yet.
  • We had a go at growing our own food again. This year the harvest wasn’t great but we got a few potatoes, onions and strawberries out of the garden.

πŸ“šπŸ•ΉοΈπŸ“Ί Entertainment

I’ve consumed a lot of books, TV shows and video games this year.

πŸ“šReading

  • The biggest chunk of reading I’ve done this year has been books from the Malazan Book of the Fallen series - an epic high-fantasy series spanning 10 volumes. It’s infamously pretty divisive in terms of its narrative style as but I love it. This year I’ve read books 4,5 and 6 and I’m about half way through volume 7.
  • In January I finished Brandon Sanderson’s latest Stormlight Archive offering: Rhythm of War which my wife got for me in signed hardback last christmas.
  • In March I read Brandon Sanderson’s Warbreaker - a standalone book within his bigger Cosmere universe.
  • In April I read Brandon Sanderson’s Arcanum Unbounded - a collection of short stories set in the Cosmere universe. My favourite short story in the collection was Shadows for Silence in the Forests of Hell - it was a bit different to his usual writing style and it was a tense, thrilling read - I couldn’t put the book down until I finished it.

I’ve read a couple of smaller non-fic books in between longer novels this year:

πŸ•ΉοΈ Gaming

  • I’ve discovered and played over 190 hours of Dyson Sphere Program - a factory builder set in space with really pretty graphics, an inspiring and uplifting soundtrack and a peaceful stress-relieving gameplay.
  • I’ve played about 40 hours of Satisfactory - a 3D factory builder with a beautiful 3D planet to explore and build factories and trains across.
  • I’ve played about 18 hours of The Ascent a top-down sci-fi shooter set on a dystopian space station where you’re caught in the cross-fire between some squabbling mega-corporations.
  • I’ve pumped a few hours into Control on my nintendo switch. It’s a sci-fi/noir game where you’re exploring a supernatural government facility that somehow feels like a cross between X-files and SCP. The Switch version of the game streams gameplay to your device from a cloud server which works surprisingly well and means that the full beauty of the game and its ray-tracing capabilities can be experienced on the switched without taxing it’s hardware too much.
  • I’ve played a few hours of Dragon Age XI - the latest in the Dragon Age JRPG series. I did enjoy what I’ve played of it so far but I found it got a bit samey

πŸ“Ί TV

Although this year has been a bit better than 2020 we did spend a lot of it locked down so there was plenty of opportunity to watch TV box sets. Some of my highlights were:

  • Upload - a sci-fi comedy/drama from Greg Daniels of Office, Parks & Rec fame, about what it would be like if you could upload your consciousness to what is essentially an MMORPG after you die and live forever in virtual reality. It’s really well done and has some pseudo-political points to make about poor/rich divide. It reminded me a lot of the uncharacteristically uplifting Black Mirror episode San Junipero which followed a similar premise.
  • Motherland - a british sitcom about the perils middle-class motherhood in London. You don’t have to be a parent to appreciate the humour - it’s full of those oh-so-cringy, overtly british, passive-aggressive social interactions that many of us can relate to. The UK Government COVID-19 spoof with the headlice was spot on.
  • Ted Lasso - has received a lot of media attention as of late. It’s basically about an American Football coach brought over to train a UK Football (soccer) team in an act of post-divorce sabotage by the former club-owner’s wife who won ownership in the split. Weirdly you don’t have to be a fan of football to appreciate the show (I’m not). I’d describe the show as aggressively wholesome in the sense that they force the warm and fuzzy feelings down your throat and you don’t have a choice but to feel optimistic and happy whilst watching. Football is life.
  • Taskmaster - a comedy “game show” where contestants - usually celebrities or comedians - get recorded completing weird and wonderful tasks and then they all watch the footage back together in the studio and Greg Davies critiques them. It may sound like an odd premise for a show but it’s highly entertaining. It’s been on for a while but we only really discovered and got into it this year. There are some great highlights in this video

πŸ πŸš—πŸŒ΄ Misc

  • We’ve been working on the house a fair bit this year. We re-gravelled our driveway and replaced our rotten old decking in the garden with new composite decking made with recycled plastic and reclaimed timber that should last years and years with minimal maintainence and allowed us to recycle/mulch the old deck.
  • We took a mini-break after our wedding in the summer during which we stayed at home but took a series of day trips to local eateries and attractions and even longleat zoo.
  • We ended up going to a number of other weddings after COVID restrictions started to ease in the UK which was super fun and it was nice to not be in the hot-seat so soon after our own wedding.
  • Not so much an achievement but I turned 30 this year. We were in lockdown on my birthday but outdoor attractions were open so we went to the zoo.