Distilled Giants: Why we need to rethink the development of small artificial intelligence

The race to develop ever-larger models of artificial intelligence has captivated the tech industry in recent years. These models, with their trillions of parameters, promise revolutionary advances in fields ranging from natural language processing to image recognition. However, this relentless pursuit of greatness comes with significant drawbacks in the form of high costs and significant environmental impact. While small AI offers a promising alternative providing efficiency and lower power consumption, the current approach to building it still requires significant resources. As we strive for small and more sustainable AI, it is critical to explore new strategies that effectively address these limitations.

Small artificial intelligence: A sustainable solution to high costs and energy demands

Developing and maintaining large AI models is expensive. Estimates suggest that the GPT-3 costs over $4 million to train, with more advanced models potentially in the high single-digit millions. These costs, including the necessary hardware, storage, computing power and human resources, are prohibitive for many organizations, especially smaller businesses and research institutions. This financial barrier creates an uneven playing field, limits access to cutting-edge AI technology, and hinders innovation.

Moreover, the energy demands associated with training large AI models are staggering. For example, it is estimated that training a large language model like GPT-3 consumes nearly 1,300 megawatt-hours (MWh) of electricity—equivalent to the annual energy use of 130 US homes. Despite this significant training cost, each ChatGPT request is associated with an inference cost of 2.9 watt-hours. The IEA estimates that the collective energy demand of AI, data centers and cryptocurrencies accounted for nearly 2 percent of global energy demand. This demand is expected to double by 2026, approaching Japan’s total electricity consumption. High energy consumption not only increases operating costs, but also contributes to the carbon footprint and exacerbates the environmental crisis. To put this into perspective, researchers estimate that training one large AI model can emit more than 626,000 pounds of CO2, equivalent to the emissions of five cars over their lifetime.

In the midst of these challenges, Small AI offers a practical solution. It is designed to be more efficient and scalable, requiring much less data and computing power. This lowers overall costs and makes advanced AI technologies accessible to smaller organizations and research teams. In addition, small AI models have lower energy requirements, which helps reduce operating costs and reduces their impact on the environment. By using optimized algorithms and methods such as transfer learning, a small AI can achieve high performance with fewer resources. This approach not only makes AI more affordable, but also promotes sustainability by minimizing energy consumption and carbon emissions.

How small models of artificial intelligence are built today

Big tech companies like Google, OpenAI and Meta are realizing the benefits of small AI and are increasingly focusing on developing compact models. This shift led to the development of models such as the Gemini Flash, GPT-4o Mini and Llama 7B. These smaller models are primarily developed using a technique called knowledge distillation.

At its core, distillation involves transferring the knowledge of a large, complex model into a smaller, more efficient version. In this process, a “teacher” model – a large AI model – is trained on large datasets to learn complex patterns and nuances. This model then generates predictions or “soft labels” that encapsulate its deep understanding.

A “student” model, which is a small artificial intelligence model, is trained to replicate these soft labels. By imitating the behavior of the teacher, the student model captures a large part of the student’s knowledge and performance while working with significantly fewer parameters.

Why we need to go beyond the distillation of great artificial intelligence

While distilling large AIs into small, more manageable versions has become a popular approach to building small AIs, there are several compelling reasons why this approach may not be the answer to all challenges in developing large AIs.

  • Permanent dependence on large models: While distillation creates smaller, more efficient AI models and improves computational and energy efficiency at inference time, it still relies heavily on initially training large AI models. This means that creating small AI models still requires significant computing resources and energy, leading to high costs and environmental impact even before distillation takes place. The need to repeatedly train large models for distillation shifts the resource burden rather than removing it. Although the goal of distillation is to reduce the size and cost of AI models, it does not remove the substantial up-front costs associated with training large “teacher” models. These initial expenses can be particularly challenging for smaller organizations and research groups. Additionally, the environmental impact of training these large models may negate some of the benefits of using smaller, more efficient models, as the carbon footprint from the initial training phase remains substantial.
  • Limited scope of innovation: Reliance on distillation can limit innovation by focusing on replicating existing large models rather than exploring new approaches. This can slow down the development of new AI architectures or methods that could provide better solutions to specific problems. Reliance on big AI limits the development of small AIs in the hands of a few resource-rich companies. As a result, the benefits of small AI are not shared equally, which can hinder broader technological progress and limit opportunities for innovation.
  • Generalization and Adaptation Challenges: Small AI models created by distillation often struggle with new, unseen data. This occurs because the distillation process may not fully capture the ability of the larger model to generalize. As a result, while these smaller models can perform familiar tasks well, they often struggle when faced with new situations. Additionally, adapting distilled models to new modalities or datasets often first requires retraining or fine-tuning the larger model. This iterative process can be complex and resource intensive, making it challenging to quickly adapt small AI models to rapidly evolving technological needs or new applications.

Bottom Line

While distilling large AI models into smaller ones may seem like a practical solution, it still relies on the high cost of training large models. To make real progress in small AI, we need to explore more innovative and sustainable practices. This means creating models designed for specific applications, improving training methods to be more cost and energy efficient, and focusing on environmental sustainability. By implementing these strategies, we can advance the development of artificial intelligence in a way that is responsible and beneficial to industry and the planet.

Leave a Comment