Content tagged with "Llms"

The relatively recently released Phi 3.5 model series includes a mixture-of-experts model featuring 16 x 3.3 Billion parameter expert models. It activates these experts two at a time resulting in pretty good performance but only 6.6 billion parameters held in memory at once. I recently wanted to try running Phi MoE 3.5 on my macbook but was blocked from doing so using my usual method whilst support is built into llama.cpp and then ollama.

Read more...

Introduction

In the world of AI assistants, subscription services like ChatGPT Plus, Claude Pro and Google One AI have become increasingly popular amongst knowledge workers. However these subscription services may not be the most cost-effective or flexible solution for everyone and the not-insignificant fees encourage users to stick to one model that the've already paid for rather than trying out different options.

Read more...

Knowing when to fine-tune LLMs and when to use an off-the-shelf model is a tricky question. New research can help shed a light on when each approach makes more sense and eke more performance out of off-the-shelf models without fine-tuning them.

When Fine-Tuning Beats GPT-4

Modern LLMs show impressive performance at a range of tasks out of the box. Even small models like the recent Llama 3: 8B show excellent performance at unseen tasks. However, a recent preprint from the research team at Predibase shows that small models can match and even out-perform GPT-4 when they are fine-tuned for specific tasks. Figure 5 from the paper shows a list of the tasks that were evaluated and the relative performance difference vs GPT-4.

Read more...

Small Large Language Model might sound like a bit of an oxymoron. However, I think it perfectly describes the class of LLMs in the 1-10 billion parameter range like Llama and Phi 3. In the last few days, Meta and Microsoft have both released these open(ish) models that can happily run on normal hardware. Both models perform surprisingly well for their size, competing with much larger models like GPT 3.5 and Mixtral. However, how well do they generalise to new unseen tasks? Can they do biology?

Read more...

Self-hosting Llama 3 as your own ChatGPT replacement service using a 10 year old graphics card and open source components.

Last week Meta launched Llama 3, the latest in their open source LLM series. Llama 3 is particularly interesting because the 8 billion parameter model, which is small enough to run on a laptop, performs as well as models 10x bigger than it. The responses it provides are as good as GPT-4 for many use cases.

Read more...

Quite unusually for me, this essay started its life as a scribble in a notebook rather than something I typed into a markdown editor. A few weeks ago, Tiago Forte made a video suggesting that people can use GPT-4 to capture their handwritten notes digitally. I've been looking for a "smart" OCR that can process my terribly scratchy, spidery handwriting for many years but none have quite cut the mustard. I thought, why not give it a go? To my absolute surprise, GPT did a reasonable job of parsing my scrawling and capturing text. I was seriously impressed.

Read more...

This week has been jam packed with traveling, meetings, events and all sorts! For an introvert like me, it's been pretty hard going pretending to be extroverted and interacting with lots of folks.

The biggest news this week was that my company won another award. A few weeks ago in September we won the CogX award for Best Fintech Company 2023. On Thursday I attended an awards ceremony with my colleague in order to accept the award for Best Emerging Tech Company (on the south coast) 2023.

Read more...

As of today, I am deprecating/archiving turbopilot, my experimental LLM runtime for code assistant type models. In this post I’m going to dive a little bit into why I built it, why I’m stopping work on it and what you can do now.

If you just want a TL;DR of alternatives then just read this bit.

Why did I build Turbopilot?

In April I got COVID over the easter break and I had to stay home for a bit. After the first couple of days I started to get restless. I needed a project to dive into while I was cooped up at home. It just so happened that people were starting to get excited about running large language models on their home computers after ggerganov published [llama.cpp]. Lots of people were experimenting with asking llama to generate funny stories but I wanted to do something more practical and useful to me.

Read more...