spoonai
TOPGoogleLiteRT-LMEdge AI

Google LiteRT-LM – Run LLMs on Your Phone, Watch, or Raspberry Pi

Google open-sourced LiteRT-LM, an edge LLM inference framework supporting Android, iOS, web, desktop and IoT. It powers Gemini Nano in Chrome and Pixel Watch.

·4분 소요·LiteRT-LM Overview – Google AI Edge
공유
Diagram showing LiteRT-LM framework supporting multiple device types
Unsplash

LLMs Are Running on a Raspberry Pi Now

"You need a data center to run AI" – that line just became officially outdated.

Google open-sourced LiteRT-LM, an inference framework that runs LLMs directly on smartphones, tablets, laptops, web browsers, and even Raspberry Pi devices. No cloud required, no API calls, no data leaving the device.

This isn't experimental tech. LiteRT-LM is the battle-tested engine already powering Gemini Nano inside Chrome, Chromebook Plus, and Pixel Watch. Google just made it available to every developer.


Why Edge AI Matters

Every time you ask ChatGPT a question, your prompt travels across the internet to a GPU cluster in a data center and the response travels back. Two problems arise: your data passes through external servers, and there's always network latency.

Edge AI eliminates both. The model runs on your device – your data never leaves, and responses are instant. The challenge has always been making LLMs small enough and efficient enough to actually work on constrained hardware.

Google has been chipping away at this problem for years.

Year Project Role
2017 TensorFlow Lite Mobile ML inference
2023 MediaPipe LLM Early mobile LLM experiments
2024 Gemini Nano On-device AI model (Pixel)
2025 LiteRT (rebrand) TF Lite successor, general on-device AI
Apr 2026 LiteRT-LM LLM-specific inference engine (open source)

LiteRT-LM is the latest piece. Released alongside Gemma 4, it completes a full-stack strategy: open model plus open runtime.


What LiteRT-LM Can Do

Cross-Platform by Default

LiteRT-LM runs on Android, iOS, web browsers, desktop (Windows/Mac/Linux), and IoT devices including Raspberry Pi. One framework, every platform. No need to juggle different inference engines for different targets.

Hardware Acceleration

Modern smartphones pack NPUs (Neural Processing Units) – dedicated chips for AI workloads. LiteRT-LM taps directly into these NPUs on Qualcomm Snapdragon, Samsung Exynos, and Google Tensor chips. GPU acceleration is also supported. The framework squeezes maximum performance from whatever hardware is available.

Supported Models

Model Size Best For
Gemma 4 E2B 2.3B params Smartphones
Gemma 4 E4B 4.5B params Tablets / high-end phones
Gemma 4 12B 12B params Desktop-class hardware
Llama Various Meta's open-source family
Phi-4 Various Microsoft's small models
Qwen Various Alibaba's open-source family

Agent Capabilities Built In

Here's the deal: LiteRT-LM supports tool use and function calling. That means an AI agent running locally on your phone can call a weather API, check your calendar, or read files – all without sending data to the cloud. Multimodal input (images and audio) is also supported, enabling camera-based analysis and voice interaction.


Google's Bigger Play

Zoom out and LiteRT-LM is just one piece of a deliberate strategy. Gemma 4's E2B model was designed to run on phones. LiteRT-LM is the engine that makes it happen. Model plus runtime, bundled together, both open source.

Think of it like Apple's hardware-software integration, except Google is giving it all away for free. The goal: if you need on-device AI, you end up in Google's ecosystem.


What This Means for You

For app developers, LiteRT-LM changes the economics of AI integration. Until now, adding AI to an app meant paying for API calls, requiring internet connectivity, and routing user data through external servers.

With LiteRT-LM plus Gemma 4: cost is zero (free model, free runtime), it works offline, and data never leaves the device.

The tradeoff is obvious – a 2.3B model on a phone won't match a hundreds-of-billions model in the cloud for complex tasks. But for grammar correction, summarization, simple Q&A, and translation, on-device is now more than good enough.

The 2026 trend is clear: not "cloud vs edge" but "cloud plus edge." Heavy lifting stays in the cloud; everyday AI moves to the device. LiteRT-LM is the infrastructure that makes the edge side work.

References

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.