TOPGoogleLiteRT-LMEdge AI

Google LiteRT-LM – Run LLMs on Your Phone, Watch, or Raspberry Pi

Google open-sourced LiteRT-LM, an edge LLM inference framework supporting Android, iOS, web, desktop and IoT. It powers Gemini Nano in Chrome and Pixel Watch.

2026년 4월 9일 (목)·4분 소요

Diagram showing LiteRT-LM framework supporting multiple device types — Unsplash

LLMs Are Running on a Raspberry Pi Now

"You need a data center to run AI" – that line just became officially outdated.

Google open-sourced LiteRT-LM, an inference framework that runs LLMs directly on smartphones, tablets, laptops, web browsers, and even Raspberry Pi devices. No cloud required, no API calls, no data leaving the device.

This isn't experimental tech. LiteRT-LM is the battle-tested engine already powering Gemini Nano inside Chrome, Chromebook Plus, and Pixel Watch. Google just made it available to every developer.

Why Edge AI Matters

Every time you ask ChatGPT a question, your prompt travels across the internet to a GPU cluster in a data center and the response travels back. Two problems arise: your data passes through external servers, and there's always network latency.

Edge AI eliminates both. The model runs on your device – your data never leaves, and responses are instant. The challenge has always been making LLMs small enough and efficient enough to actually work on constrained hardware.

Google has been chipping away at this problem for years.

Year	Project	Role
2017	TensorFlow Lite	Mobile ML inference
2023	MediaPipe LLM	Early mobile LLM experiments
2024	Gemini Nano	On-device AI model (Pixel)
2025	LiteRT (rebrand)	TF Lite successor, general on-device AI
Apr 2026	LiteRT-LM	LLM-specific inference engine (open source)

LiteRT-LM is the latest piece. Released alongside Gemma 4, it completes a full-stack strategy: open model plus open runtime.

What LiteRT-LM Can Do

Cross-Platform by Default

LiteRT-LM runs on Android, iOS, web browsers, desktop (Windows/Mac/Linux), and IoT devices including Raspberry Pi. One framework, every platform. No need to juggle different inference engines for different targets.

Hardware Acceleration

Modern smartphones pack NPUs (Neural Processing Units) – dedicated chips for AI workloads. LiteRT-LM taps directly into these NPUs on Qualcomm Snapdragon, Samsung Exynos, and Google Tensor chips. GPU acceleration is also supported. The framework squeezes maximum performance from whatever hardware is available.

Supported Models

Model	Size	Best For
Gemma 4 E2B	2.3B params	Smartphones
Gemma 4 E4B	4.5B params	Tablets / high-end phones
Gemma 4 12B	12B params	Desktop-class hardware
Llama	Various	Meta's open-source family
Phi-4	Various	Microsoft's small models
Qwen	Various	Alibaba's open-source family

Agent Capabilities Built In

Here's the deal: LiteRT-LM supports tool use and function calling. That means an AI agent running locally on your phone can call a weather API, check your calendar, or read files – all without sending data to the cloud. Multimodal input (images and audio) is also supported, enabling camera-based analysis and voice interaction.

Google's Bigger Play

Zoom out and LiteRT-LM is just one piece of a deliberate strategy. Gemma 4's E2B model was designed to run on phones. LiteRT-LM is the engine that makes it happen. Model plus runtime, bundled together, both open source.

Think of it like Apple's hardware-software integration, except Google is giving it all away for free. The goal: if you need on-device AI, you end up in Google's ecosystem.

What This Means for You

For app developers, LiteRT-LM changes the economics of AI integration. Until now, adding AI to an app meant paying for API calls, requiring internet connectivity, and routing user data through external servers.

With LiteRT-LM plus Gemma 4: cost is zero (free model, free runtime), it works offline, and data never leaves the device.

The tradeoff is obvious – a 2.3B model on a phone won't match a hundreds-of-billions model in the cloud for complex tasks. But for grammar correction, summarization, simple Q&A, and translation, on-device is now more than good enough.

The 2026 trend is clear: not "cloud vs edge" but "cloud plus edge." Heavy lifting stays in the cloud; everyday AI moves to the device. LiteRT-LM is the infrastructure that makes the edge side work.

References

Frequently Asked Questions

What is the article "Google LiteRT-LM – Run LLMs on Your Phone, Watch, or Raspberry Pi" about?

Google open-sourced LiteRT-LM, an edge LLM inference framework supporting Android, iOS, web, desktop and IoT. It powers Gemini Nano in Chrome and Pixel Watch.

Why is this news important?

"You need a data center to run AI" – that line just became officially outdated.

Which companies or organizations are mentioned in this article?

The key entities covered in this article include Google, LiteRT-LM, Edge AI, On-Device, Gemma 4, Inference, Open Source.

When was this article published?

This article was published on 2026-04-09 by spoonai.

What are the main topics covered in this article?

This article covers: LLMs Are Running on a Raspberry Pi Now, Why Edge AI Matters, What LiteRT-LM Can Do, Google's Bigger Play, What This Means for You.

Google LiteRT-LM – Run LLMs on Your Phone, Watch, or Raspberry Pi

LLMs Are Running on a Raspberry Pi Now

Why Edge AI Matters