Narrowing Down
I started on this journey after reading news articles about how Microsoft is partnering with Constellation Energy on spending $1.6 billion to reopen Three Mile Island. This would be to exclusively power Microsoft's furture AI endeavors. Amazon and Google are doing the same, with Amazon funding the development of small portable nuclear reactors. I grew ever more interested at the sheer scale of how much energy these new AI models would require; this is on the scale of gigawatts at this point, something quite frankly ridiculous to even fathom. Though, when diving into how large these new artificial intelligence models are--we're talking on the scale of tens of billions of parameters for ChatGPT, Ollama, and the like--it slowly begins to make sense.
The new generation of AI models, with algorithms like transformers and neural networks powering them, require upwards of tens of thousands of GPUs (Graphical Processing Units) running in giant data centers--think any sort of Google data center you may have heard about--running for days, weeks, months at a time to process and train on a particular dataset. Plus, according to Goldman Sachs, an average search with ChatGPT consumes ten times more energy than the equivalent Google search.
So, what can we do to mitigate this? Well, there is a subset of AI research, known as Green AI and coined by Roy Schwartz et al., that seeks to pivot the current research space towards including energy usage as one of the parameters AI models train on; this is in addition to the state-of-the-art results most commonly trained for in AI. The effect of this is to, hopefully, flatten out the exponential curve of training cost since the monumental release of AlexNet in 2012.
There are various methods to go about doing this. Most of the research I have come across dabbles in optimizing particular components of AI models, such as training the model with a more efficient activation function (the function that dictates when a neuron in an AI model will fire). But, a good portion of the research within the Green AI space is a cataloging of other research papers and happenings into categories and trends. While this was extremely helpful in the initial stages of finding a gap to attack, these more general trend-analyzing papers had little substantial to go off of.
The project really came into focus when I discovered the NeuralPower paper by Ermao Cai et al NeuralPower, as described by Cai, is a mathematical model and program to model a given AI's, specifically a neural network's, energy usage and runtime during the inference stage of development where the fully-trained AI model is repeatedly queried and tested at scale, often millions of times. Through the NeuralPower paper I discovered yet another gem in the form of Paleo by Hang Qi et al, which models a neural network's performance during the training stage, where a newly conceived AI model is fed input data and trained to correctly guess or formulate an output.
The final piece of the puzzle, where I could actually, potentially, realize my goal of generating new AI models instead of purely running given ones came at the hands of the Neural Architecture Optimization paper by Renqian Luo et al. Here, the Neural Architecture Optimization in question falls into the category of automatic neural network design. The broader space of automatic network design is where an optimized AI network will gradually take shape after a developer-given algorithm searches through the discrete AI architecture space.
Finally, I had fully formulated a project and methodology for generating new, energy-effciency optimized neural networks.
The Gap
While spending time peering around the space, I have not seen any program or project that generates new AI architectures optimized for energy-efficiency.
The two programs that did, in fact, model an AI's energy usage, that being Paleo and NeuralPower, only worked in the scope of a single dev-given architecture. The programs did not generate new architectures.
On the other, generative, aspect of the project, the automatic architecutre design algorithms I have seen, including the Neural Architecture Optimization's algorithm, purely used the generative capacbilities to optimize for state-of-the-art, accurate AI models. There does not exist, to my knowledge, an AI-optimizing algorithm that has been used to produce energy-efficient models over accurate ones.
Both mentioned aspects combine to produce a seemingly novel concept within the space, where I can input my results into the field.
Methodology
This methodology is simple, yet so very complex at the same time. The simple part comes from the fact that the entire project consists of me shoving various written AI architectures through a program, seeing what comes out, and comparing the strucutre and data between the two. Repeat this ad nauseum for a whole two months and I should have a plethora of data, good or bad, to sort through for analysis. The hard and complex part is when I have to actually implement the program.
Starting at the beginning of the program's span, we receive a text-based representation of an artificial intelligence network, and we must parse the text into a usable code structure; I will be using the representation outlined in the Paleo code associated with the paper. Next, we take this code strucutre and run it through both the Paleo and NeuralPower models to generate some initial numbers. Here comes the fun part. Now, we must take this given starting AI architecture and run it through Neural Architecutre Optimization's algorithm.
Breaking Neural Architecture Optmization's algorithm down, wer start with the full structure of the AI network. Then, we encode this structure into a continuous mathematical representation. After, we take this math network and begin the searching process with a performance predictor. The performance predictor is where the optmization happens, and where we are going to inject the Paleo and NeuralPower models agian to specifically optimize for energy usage. For more detail, the performance predictor is a function that maps the continous representaion of an architecture to its measured performace. Once we have generated an optimized network, we decode out the tokens from the continuous mathematical represenation. Finally, we have mostly reached the end.
Here's a more concise overview of the project:
--- given AI architecture as input
--- convert architecture into code
--- run inital measurements
--- convert architecture into mathematical representation
--- use the math representation to optimize for energy-efficency
--- convert architecture back into code
--- run final tests
--- convert architecture back into text
--- return final AI architecture as output
Relevance
The general goal of making artificial intelligence networks more energy conscious is to allow cheaper industrial models, as the exponential energy costs coupled with the ever expanding physical resource demand strains conventional electrical grids. Equally as important as the first goal, making AI networks more energy efficiency will greatly improve the ability of the average person, with the much less powerful laptops and smartphones at their disposal, to build more powerful AI models within their own bedroom, lowering the barrier to entry of AI development and research.
On the flip side, I just think it would be really funny to one day train an AI with a toaster oven.