Activity

  • This week has been pretty busy as we attended a friend’s wedding up in the North West of the country. We took friday off to make a 6 hour road trip up to Tarpoley (pronounced Tar-plea or Tar-pleh with a cheshire accent) where we stayed in a cute old-timey hotel called The Swan. The wedding itself was in Peckforton Castle, a stately home built up on the Cheshire hillside in 1800s but in the style of a Gothic, medieval castle.
  • Now that spring is starting to kick in and the sun is rising earlier in the morning, things are starting to feel less dark and gloomy day-to-day. I’m starting to think about and make plans for our garden this year and what food I plan to grow. I don’t have room (or time + energy) to grow enough food for us to be self-sufficient, but I hope to supplement out regular food shop with some home-grown goodies later in the year.
  • I finally finished Cytonic by Brandon Sanderson. For some reason it’s taken me a little while to read even though I normally love a good brando-sando. I have his new Novel “The Lost Metal” to start reading next, set in era 2 of the Mistborn saga. I’m also making my way through Kris Nova’s Hacking Capitalism.
  • I spent most of Wednesday and Thursday trying to get the contents of a Google BigQuery table out and into a MySQL database. The reason is that BigQuery’s pricing model is based on how much data your query processes as a proxy for how much compute you use. This makes it quite efficient if you do single queries with large batches of results and very inefficient if you want to lookup a value (e.g. for a given company in this table, what was their revenue last year?). Each time BigQuery reads the full table we were essentially processing 10GiB of data - and at the price of $5 TiB it would cost $5000 to extract the information we wanted for 600k companies via individual lookups. Even with indexing this kind of operation is quite expensive in bigquery - it would have cost $189 to extract the information we wanted to access by querying row-by-row. BigQuery has a CSV export function and Google’s CloudSQL (MySQL and PostgreSQL hosting) has a CSV import function so I figured - how hard can it be. As it turns out, hard. I might write a longer post about what I got up to but suffice to say after 2 full days of my time we can now repeat this process in an automated way and host the full dataset in a traditional DB for about $45/month (plus the extraction of the data from BigQuery which is fractions of a penny if we do it once a month).

Blog Posts

This week I posted about how scientific journalism is a bit like a blurry photo of science - a kind of spiritual response to Ted Chiang’s recent piece for The New Yorker which I thought was a great metaphor but which received some stick from the scientific community for being inaccurate.

  • GitHub - mhgolkar/Weasel: WebSocket Client / Console - a client for testing websockets that I saw Aral recommend on Mastodon. I don’t do a huge amount with websockets but an upcoming product feature at work is likely to make use of them so I’ve bookmarked it for future reference and added it to my digital garden.
  • Some really interesting case studies about how Bing’s GPT assistant Sydney cannot be trusted
  • As I mentioned above, I spent far too long this week trying to move data between two database systems. I found this article about efficient ways to bulk insert data into MySQL quickly.