Brainsteam

Introduction

In the world of AI assistants, subscription services like ChatGPT Plus, Claude Pro and Google One AI have become increasingly popular amongst knowledge workers. However these subscription services may not be the most cost-effective or flexible solution for everyone and the not-insignificant fees encourage users to stick to one model that the've already paid for rather than trying out different options.

I previously wrote about how you can self-host Llama3 on a machine with an older graphics card using open source tools. In this post, I demonstrate how to expand this setup to allow you to interact with OpenAI, Anthropic, Google and others alongside your local models via a single, self-hosted UI.

Why ditch my subscription?

Hosting your own AI user interface allows for more granular cost control, potentially saving money for lighter users of commercial models (are you really getting $20/mo of usage out of ChatGPT?). It grants you greater flexibility in choosing and comparing different AI models without worrying about which subscription to spend your $20 on this month or forking out for more than one at a time. Additionally, commercial AI providers usually provide a more stable experience via their APIs since business users, the target audience of the API offerings, tend to have more leverage than consumers when it comes to service-level-agreements. API terms & conditions around data usage are normally better too (within ChatGPT, data collection is opt-out and you lose functionality when you do it) .

Self-hosting your chat interface with Open Web UI brings enhtanced privacy through a local UI, the capability to run prompts through multiple models simultaneously, and the freedom to use different models for different tasks without being locked into a single provider. You can build your own bespoke catalogue of commercial and local models (like Llama-3) and access them and their outputs all from one place. If you're already familiar with docker and docker-compose you can have something up and running in 15 minutes.

When might this not work for me?

If you are a particularly heavy user of the service that you're subscribed to and you're not keen on trying other models you may find that moving to pay-as-you-go is more expensive for you. The average length of a fiction book is something like 60-90k words. It costs about $5 to have GPT-4o read 16 books (input/prompt) and $15 to have it write 16 books worth of output. If you're spending all day every day chatting to Jippy then you might find that you end up spending more than the $20/mo on API usage.

If cost is your primary driver, you should factor in the cost of hosting your server too. If you are already running a homelab or a cheap VPS you might be able to run the web UI at "no extra charge" but if you need to spin up a new server just for hosting your web UI that's going to cut into your subscription savings.

If privacy is your primary driver, you should be aware that this approach still involves sending data to third parties. If you want to avoid that all together you'll need to go fully self hosted.

Finally, fair warning: this process is technical, and you'll need to be familiar (or willing to learn about) Docker, yaml config files and APIs.

The Setup

This setup builds on my previous post about Llama 3. Previously we used Open Web UI to provide a web app that allows you to talk to the LLM and we used Ollama to host Llama 3 locally and provide a system that Open Web UI could send requests to.