DeepSeek R1 on Raspbery Pi: Future of offline AI in 2025?

DeepSeek R1 on Raspbery Pi: Future of offline AI in 2025?

A tech guy would always prefer an open-source LLM over something that is closed source and hosted in clouds for obvious reasons like privacy and biases, but many of us have to opt for something like ChatGPT for two main reasons. 

Also read: Deepseek R1 vs Llama 3.2 vs ChatGPT o1: Which AI model wins?

If we try running a model similar to ChatGPT, we would need a high-end system that is not only expensive but also costly to operate. Still, we can not guarantee that the output would be as good as something like ChatGPT. 

But things might change (pretty soon)!

Recently, Brian Roemmele announced a DeepSeek-AI R1 model claiming that it beats o1 in terms of accuracy and still generates 200 tokens per second on Raspberry Pi. And let me just say that getting 200 tokens per second on Raspberry Pi with an LLM model better than OpenAI o1 is simply insane!

DeepSeek R1 + Raspberry Pi = Future of on-premise AI?

DeepSeek-R1 is a first-generation reasoning model that stands out for its unique training approach. Unlike traditional models that rely on supervised fine-tuning (SFT) as a preliminary step, DeepSeek-R1 was trained directly via large-scale RL, leading to the emergence of powerful reasoning behaviours. 

This model, along with its predecessor DeepSeek-R1-Zero, has been open-sourced to support the research community. It offers insights into how AI can evolve without the need for extensive human-labeled data. 

Also read: Also read: DeepSeek-R1, BLOOM and Falcon AI: Exploring lesser-known open source LLMs

One of the most intriguing aspects of DeepSeek-R1 is its ability to generate around 200 tokens per second on a Raspberry Pi, a testament to its efficiency and adaptability to resource-constrained environments. 

The blow graph demonstrates how DeepSeek R1 while being a smaller model compared to other models like ChatGPT, still provides better accuracy: 

There are few problems (At least for now)

While the author claims 200 tokens per second, one of his recent replies on X clarifies that as of now, the system is too stressed and overheating when pushing it to generate 200 tokens per second so they dropped the tokens to 90 and figuring out ways to stabilise the model and can go up to 250 tokens per seconds in future as well. 

Also read: OpenAI o3 model: How good is ChatGPT’s next AI version?

Also, the author has not disclosed which model is being used as a base for DeepSeek R1 and has mentioned that they are experimenting with four different models from DeepSeek, but we can assume that the finalised model will be somewhat smaller, something like DeepSeek 1.5b.

There are also talks of how the whole thing has been advertised wrong. Adam Pell mentioned on X that using DeepSeek R1 feels like using ChatGPT of 2023 and is nowhere near the o1 performance. 

But even if it feels like running ChatGPT in 2023 (approx 18 months behind), we can not forget the fact that the numbers are achieved from a card-sized computer Raspberry Pi and an open source model which can be fine-tuned again for your specific datasets all while keeping privacy the first priority.

Also read: Also read: OpenAI launches Operator: How will this AI agent impact the industry?

Sagar Sharma

Sagar Sharma

A software engineer who happens to love testing computers and sometimes they crash. While reviving his crashed system, you can find him reading literature, manga, or watering plants. View Full Profile

Digit.in
Logo
Digit.in
Logo