SLM vs LLM: Why smaller Gen AI models are better

SLM vs LLM: Why smaller Gen AI models are better

In the realm of artificial intelligence, especially Generative AI, we’ve all been familiarised with the term LLM or Large Language Model for some time now. For the uninitiated, “SLM” might sound unfamiliar, but these Small Learning Models are playing an increasingly vital role in various technological applications. 

LLMs like OpenAI’s GPT-4 or Meta’s Llama 3.1 are trained on vast amounts of data with the end goal of performing a wide range of tasks across various domains. However, as the AI landscape evolves, a new paradigm of specialised artificial intelligence is emerging, represented by Small Language Models (SLMs), according to Dr Aakash Patil, Deep Learning Researcher at Stanford University. 

Also read: How RAG boosts LLM accuracy and reduces AI hallucination

“SLMs are designed to be more compact and efficient, typically containing fewer parameters than LLMs. This smaller size doesn’t necessarily mean reduced capability; rather, it often translates to faster processing and lower computational costs, especially in resource-constrained environments,” says Dr Patil.

Small Language Models (SLMs) have already started rocking the AI industry in ways that many didn’t anticipate, suggests Prem Naraindas, CEO and Founder of Katonic AI. “While the hype has largely centred around behemoth models like GPT-4, I’ve long maintained that the real revolution will come from more nimble, specialised models,” he says.

SLM vs LLM: Bigger isn’t always better

There’s no fixed definition of how small an SLM needs to be for it to be called an SLM. At the moment, “small and nimble” is roughly defined as anything under 20 billion parameters, according to Intel’s researchers. The size threshold is a moving target that may double later in the coming year, but it does give you an idea how SLMs are significantly smaller in size than the likes of  snapshot comparison against the 175 billion parameters of ChatGPT 3.5 or 1.76 trillion parameters size of ChatGPT 4, for instance.

When it comes to Generative AI, developers are inundated with choices – perhaps too many, believes Moshe Berchansky, Senior AI Researcher at Emergent AI Research Labs, Intel. A small number of giant models are good for general and multi-purpose use cases, and a giant number of small models are for added efficiency, accuracy, security, and traceability. If you decide to become an entrepreneur and build your own AI startup, there’s plenty of decisions to take between opting for an LLM vs SLM.

Giant models, while versatile, can be computationally expensive and may not be ideal for resource-constrained environments. Smaller, nimbler models, often 10x-100x smaller, offer improved efficiency and accuracy for targeted tasks. Whether a model is general-purpose or targeted depends on the specific needs of the application. Finally, the decision between cloud-based and local inference depends on factors like latency, privacy, and cost.

“While LLMs require enormous computational resources for training and inference, SLMs demonstrate that relatively less compute can yield more reliable and accurate responses in specialised contexts. This efficiency is particularly crucial for industries where precision and domain-specific knowledge are paramount,” says Dr Aakash Patil.

According to Intel’s research, Dolly, Stable Diffusion, StarCoder, DALL·E, and Phi are powerful examples of models at this scale. Microsoft Research’s Phi 2, at 2.7 billion parameters, recently showcased amazing progress of so-called “small language models” on benchmarks in terms of common sense, language understanding, and logical reasoning. Such results argue for significant roles of small models, including in mixed implementations alongside larger ones, suggests Intel.

How SLMs are revolutionising AI applications

Microsoft’s Phi-2, with just 2.7 billion parameters, has shown remarkable performance in code-related tasks. IBM’s Granite 13 billion parameter model, despite being more than five times smaller, performed better than Llama 2 with 70 billion parameters in 9 out of 11 finance-related tasks.

From a developer standpoint, SLMs are a game-changer, according to Prem Naraindas of Katonic AI. “They’re ushering in an era of rapid prototyping and iteration that was simply unfeasible with LLMs. At Katonic, we’ve seen teams slash development cycles by 60-70% when working with SLMs. The ability to fine-tune these models on domain-specific data without breaking the bank is democratising AI development in unprecedented ways,” he says, suggesting how he’s seeing clients across industries clamouring for more efficient, specialised AI solutions. They want the power of advanced language models but with the agility and precision that only SLMs can provide. The economics of running massive models like GPT-4 are simply unnecessary for many applications. SLMs offer comparable performance in specific domains at a fraction of the cost. This isn’t just about saving money; it’s about making AI accessible to a broader range of businesses and use cases.

This sentiment is echoed by Stanford University’s Dr Aakash Patil, who thinks SLMs allow for the development of specialised models, known as Domain-Aligned Models. “They take the idea of SLMs a step further by incorporating deep, industry-specific knowledge and regulatory understanding into their core architecture. Such domain-aligned models are particularly promising for critical, regulated, and geography-dependent sectors such as law, finance, insurance, and healthcare.”

Intel has observed a growing trend among enterprise customers discussing Generative AI: a preference for specialised models tailored to specific functions over generic, all-purpose models. For instance, a major healthcare provider recently questioned why a single model would be suitable for managing both supply chains and patient records. It’s a question that’s increasingly being asked by several companies and industries, as today’s fine-tuning methods applied to smaller open source models are a powerful alternative, according to Intel.

In terms of hardware requirements, SLMs are running on both cloud as well as end-user devices. “For cloud deployment, smaller GPUs like T4 or V100 GPUs can easily handle most SLM workloads. For edge deployment, we’re seeing promising results with hardware accelerators like Google’s Edge TPU or NVIDIA’s Jetson series,” says Prem Naraindas of Katonic AI. But where SLMs really shine is at the edge, he adds, “As we’re now able to run sophisticated language models on devices like smartphones, tablets, and even IoT sensors, opening a world of possibilities for real-time, low-latency AI applications.”

Also read: LLM to RAG: Decoding AI jargon matters, here’s why

Of course, SLMs aren’t without their own unique limitations. SLMs may struggle with complex tasks that require extensive world knowledge or general reasoning capabilities. They may also be less adept at handling ambiguity or generating creative content compared to larger LLMs. If the training data is biased, the model is likely to perpetuate those biases in its outputs as well. Having said all of that, SLMs represent a promising avenue for developing specialised, efficient, and cost-effective AI solutions.

The rise of SLMs isn’t just satisfying current market needs; it’s shaping the future of AI, believes Prem Naraindas of Katonic AI. “It’s pushing us to rethink our approach to model architecture, training techniques, and deployment strategies. At Katonic, we are focussing on innovation in areas like knowledge distillation and sparse modelling to squeeze more capability into smaller parameter spaces,” he went on to say.

Stanford researcher Dr Patil agrees, explaining how the evolution from general-purpose LLMs to domain-aligned and specialised SLMs showcases the AI field’s move towards more targeted, efficient, and domain-aligned solutions. “By aligning closely with domain-specific requirements in addition to ethical and safety guidelines, these models are set to drive the next wave of AI adoption in sectors where precision and compliance are non-negotiable. By focusing on small models tailored to local needs and regulations, these countries can build valuable AI assets without the massive computational resources required for training large general models. This approach can help transform AI-deficient businesses into AI-assisted enterprises, allowing them to take charge of their own data and drive innovation to be leaders in their sectors. This trend promises to unlock new possibilities for large-scale generative AI adoption, potentially revolutionising the entire AI industry,” concludes Dr Aakash Patil.

Also read: Meta AI manifesto: The AI-assisted resurrection of Mark Zuckerberg

Jayesh Shinde

Jayesh Shinde

Executive Editor at Digit. Technology journalist since Jan 2008, with stints at Indiatimes.com and PCWorld.in. Enthusiastic dad, reluctant traveler, weekend gamer, LOTR nerd, pseudo bon vivant. View Full Profile

Digit.in
Logo
Digit.in
Logo