In recent weeks and months the impending global climate catastrophe has been at the forefront of many peoples’ minds. Thanks to movements like Extinction Rebellion and high profile environmentalists like Greta Thunberg and David Attenborough as well as damning reports from the IPCC, it finally feels like momentum is building behind significant reduction of carbon emissions. That said, knowing how we can help on an individual level beyond driving and flying less still feels very overwhelming.

The Energy Issue

A recent study by Strubel et al. (2019) gave insight into exactly how much energy certain neural architectures require to train. Their findings show that training some of the largest and most complex neural models and neural architecture search (in which multiple models are trained and measured against a fitness function to find the most performant model for a given task) consumes huge amounts of energy. Assuming that energy came from fossil-fuel power plants, a fair assumption since most researchers are using cloud providers like AWS and GCP which rely largely on carbon-generated electricity, the models are producing more CO2 pollution than a car produces in its lifetime.

Predictably, mainstream media misconstrued the findings and articles proposing abandonment of deep learning as a field started to surface (see Charles Radclyffe’s Forbes article: AI’s Dirty Secret, if AI really does burn this much electricity, then maybe we should just pull the plug if we’re serious about climate change?”).

My biggest objection to this conclusion is that it is based upon the notion that all AI is this power hungry. As I said above, Strubel’s study is based on some of the biggest and most complex models in the field today. My intuition would be that most data scientists and AI researchers are not training models anywhere near this big and for many data problems it is not even necessary to use deep learning (as I discuss below).

My second objection to this notion that we should scrap AI is that we necessarily dismiss any and all potential benefits of continuing to develop models that reduce energy consumption by optimising data centres, logistics routes and even energy grids. In the future the mass adoption of self driving tech could save vast amounts of energy by removing erratic human drivers from the road with their fuel-hungry acceleration and braking behaviours. No more human drivers? Less need for traffic control measures which force millions of us to slow down and speed up every day – burning large amounts of fuel that wouldn’t be needed if we maintained a steady speed. None of this would be possible if we just stop trying to improve deep learning approaches overnight.

The BERT language model, one of Strubel’s worst offenders is, at the time of writing, the state of the art approach for a number of natural language processing tasks. What if BERT-based models powering chatbots and smart speakers could help consumers to make better purchasing decisions and prevent thousands of packages from being shipped and then returned on gas guzzling lorries, planes and cargo ships?

20 years ago most of us had power hungry CRT monitors and TVs that we’ve since replaced with more efficient LCD and LED displays. We were using incandescent lightbulbs that use 6x more electricity than a modern bulb and need replacing a order of magnitude more frequently. Our renewable generation technology has come on leaps and bounds with solar panels becoming significantly cheaper and more efficient over the last 20 years. My point here is that humans are pretty good at improving the energy efficiency of our inventions. I’m sure most readers who frequently sit in their electrically lit living room at 10pm at night watching a flat screen TV or scrolling on an OLED touch screen on their smartphone are glad that we didn’t give up on these technologies because CRT screens and incandescent bulbs are to energy hungry.

What can the AI community do?

There are a number of things that the AI community can do to help reduce their carbon footprint. Some are simpler and more straight forward, others are a little more involved.

KISS – Keep it simple stupid!

When you’re building ML models always start with a simple model first. It may be tempting to charge in with a deep learning model immediately but these models are slow to train, prone to overfitting due to their complexity and of course energy hungry. Aside from apeasing the marketing department, there is absolutely no advantage to using a deep learning model before you’ve even tried Logistic Regression or, whoa don’t go too crazy now, a random decision forest!

Even if you train a few different ‘simple’ models with different data folds and hyper parameters you’ll probably find it quicker and a less energy hungry starting point. Of course if simple models don’t work, deep learning is a good option.

Pre-trained models and transfer learning

This could apply to both simple models (well kinda) and deep learning models.

It is well known by now that the best way to get near state of the art performance for classification tasks in NLP and computer vision is to take a pre-trained model like BERT or ResNet and “continue” training by updating the last few layers of the neural model with new weights.

Unless you’re a multi-national or a top tier research institute with lots of money and data to throw at training then trying to train one of these systems from scratch may be a waste of time and energy anyway (I said ‘may be,’ not ‘always’. If you’re working on new state-of-the-art models then I salute you! We should always strive to better ourselves!).

You can also combine the KISS approach with pre-trained weights. You can achieve some really great text classification results by using pre-trained word embeddings like GloVe, word2vec or fastText with a linear classification model like SVM.

Scale down big data

If you’re developing a model and working with a massive dataset, you might consider training on a small but representational subset of the data. You’ll need to be very careful about this, especially if your dataset is not well balanced or has very rare features (in NLP this could be words that are important but only occur in a tiny proportion of documents). However, if you know that you’re likely to need to change the model 10 more times before you calculate your final performance metrics, it might (but won’t always) makes sense to train it on 10,000 samples instead of 100,000 samples.

If you’re building models that use a gradient descent or evolutionary training approach then you could also limit the number of epochs during development of your model.

Give patronage to “green” hosting providers

Big companies are not always the most transparent so this suggestion could be trickier. That said, taking your money where the ethical hosting is could be a good way to reduce your model’s carbon footprint. Especially if you are one of the pioneers working on massive models that use a lot of electricity. Hardware is an important consideration too. GPUs have been a key tool in the evolution of deep learning over the last 10 years but it turns out that TPUs are better suited to deep learning and much less energy hungry with that.

Controversial Suggestion: Carbon Reporting in AI and ML Scientific Publications

This one’s probably going to be a divisive suggestion but what if we could get all the big ML academic conferences to require some basic calculation of energy usage with all new model architecture submissions? The idea is to introduce a race to the bottom for AI model power consumption. A model that uses 100x less electricity and achieves near state-of-the-art performance would be much more interesting than one that improves state-of-the-art performance by 0.1%

I’m well aware that this solution is far from perfect given cloud hosting transparency concerns (see above) and conference organisers would have to think carefully about how to set up peer reviews in a way that avoids always rewarding energy efficiency at the expense of model task performance.

I guess another approach could be an international conference for energy efficient machine learning systems. I’d be interested in whether there’s enough interest in such a conference from the academic community that I’d seriously consider organising such an event. Also if one already exists I’d be interested in participating.

If you’d like to discuss the above I’m on twitter @jamesravey

Conclusion

In closing, I’m really glad that Strubel et al have brought this issue to the forefront of our minds and that the work has picked up so much attention. Rather than panicking and downing our tools, I think it’s important that we remain optimistic about AI and the huge advantages that it can bring and that we try to be as considerate as possible of environmental factors whenever we develop new approaches.