Google LiteRT-LM – Run LLMs on Your Phone, Watch, or Raspberry Pi
Google open-sourced LiteRT-LM, an edge LLM inference framework supporting Android, iOS, web, desktop and IoT. It powers Gemini Nano in Chrome and Pixel Watch.

LLMs Are Running on a Raspberry Pi Now
"You need a data center to run AI" – that line just became officially outdated.
Google open-sourced LiteRT-LM, an inference framework that runs LLMs directly on smartphones, tablets, laptops, web browsers, and even Raspberry Pi devices. No cloud required, no API calls, no data leaving the device.
This isn't experimental tech. LiteRT-LM is the battle-tested engine already powering Gemini Nano inside Chrome, Chromebook Plus, and Pixel Watch. Google just made it available to every developer.
Why Edge AI Matters
Every time you ask ChatGPT a question, your prompt travels across the internet to a GPU cluster in a data center and the response travels back. Two problems arise: your data passes through external servers, and there's always network latency.
Edge AI eliminates both. The model runs on your device – your data never leaves, and responses are instant. The challenge has always been making LLMs small enough and efficient enough to actually work on constrained hardware.
Google has been chipping away at this problem for years.
| Year | Project | Role |
|---|---|---|
| 2017 | TensorFlow Lite | Mobile ML inference |
| 2023 | MediaPipe LLM | Early mobile LLM experiments |
| 2024 | Gemini Nano | On-device AI model (Pixel) |
| 2025 | LiteRT (rebrand) | TF Lite successor, general on-device AI |
| Apr 2026 | LiteRT-LM | LLM-specific inference engine (open source) |
LiteRT-LM is the latest piece. Released alongside Gemma 4, it completes a full-stack strategy: open model plus open runtime.
What LiteRT-LM Can Do
Cross-Platform by Default
LiteRT-LM runs on Android, iOS, web browsers, desktop (Windows/Mac/Linux), and IoT devices including Raspberry Pi. One framework, every platform. No need to juggle different inference engines for different targets.
Hardware Acceleration
Modern smartphones pack NPUs (Neural Processing Units) – dedicated chips for AI workloads. LiteRT-LM taps directly into these NPUs on Qualcomm Snapdragon, Samsung Exynos, and Google Tensor chips. GPU acceleration is also supported. The framework squeezes maximum performance from whatever hardware is available.
Supported Models
| Model | Size | Best For |
|---|---|---|
| Gemma 4 E2B | 2.3B params | Smartphones |
| Gemma 4 E4B | 4.5B params | Tablets / high-end phones |
| Gemma 4 12B | 12B params | Desktop-class hardware |
| Llama | Various | Meta's open-source family |
| Phi-4 | Various | Microsoft's small models |
| Qwen | Various | Alibaba's open-source family |
Agent Capabilities Built In
Here's the deal: LiteRT-LM supports tool use and function calling. That means an AI agent running locally on your phone can call a weather API, check your calendar, or read files – all without sending data to the cloud. Multimodal input (images and audio) is also supported, enabling camera-based analysis and voice interaction.
Google's Bigger Play
Zoom out and LiteRT-LM is just one piece of a deliberate strategy. Gemma 4's E2B model was designed to run on phones. LiteRT-LM is the engine that makes it happen. Model plus runtime, bundled together, both open source.
Think of it like Apple's hardware-software integration, except Google is giving it all away for free. The goal: if you need on-device AI, you end up in Google's ecosystem.
What This Means for You
For app developers, LiteRT-LM changes the economics of AI integration. Until now, adding AI to an app meant paying for API calls, requiring internet connectivity, and routing user data through external servers.
With LiteRT-LM plus Gemma 4: cost is zero (free model, free runtime), it works offline, and data never leaves the device.
The tradeoff is obvious – a 2.3B model on a phone won't match a hundreds-of-billions model in the cloud for complex tasks. But for grammar correction, summarization, simple Q&A, and translation, on-device is now more than good enough.
The 2026 trend is clear: not "cloud vs edge" but "cloud plus edge." Heavy lifting stays in the cloud; everyday AI moves to the device. LiteRT-LM is the infrastructure that makes the edge side work.
References
출처
관련 기사
42.5 ExaFLOPS: Google's Ironwood TPU Rewrites the Inference Playbook

Gemma 4 Is Here — And It's Finally Apache 2.0

DeepSeek V4 — 1 Trillion Parameters, Open-Weight, and Everything You Need to Know
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
