How to Run AI Offline on Your Android Phone

Get private, free AI chat on-device with TokForge. No internet, no subscription, no data sharing.

Why Run AI Offline?

Running AI directly on your phone changes everything. Here's what you get:

Complete privacy. Your conversations never leave your device. No servers, no logs, no third parties.
Zero cost forever. Download TokForge for free and use it forever. No subscriptions, no API costs.
Works without internet. Use AI on airplanes, in mountains, on long commutes—anywhere, anytime.
You own your data. Every message, every insight stays on your phone. You're in complete control.

This is the opposite of ChatGPT or Claude. Those services are powerful, but they require internet and they see everything you ask them. Offline AI means total independence.

What You Need

Before you start, make sure you have:

Android phone with at least 6GB RAM (8GB+ recommended for best experience)
2–6GB free storage per AI model you want to run
TokForge app (free from Google Play)
A few minutes to download your first model

That's it. No special hardware, no extra apps, no coding required.

Step-by-Step Setup

Step 1: Install TokForge

Head to the Google Play Store and search for TokForge. Tap "Install" and wait for it to download (around 50–100MB depending on your region).

Step 2: Pick Your Model

TokForge Model Browser showing recommended models with RAM indicator

The Model Browser shows which models fit your device's RAM

Open TokForge and go to Model Manager. You'll see a list of AI models. Here's which to pick based on your phone's RAM:

RAM Tier	Recommended Model	Size	Use Case
4GB	Qwen3.5 0.8B	0.6GB	Basic chat, lightweight
6GB	Qwen3.5 2B or Llama 3.2 3B	1.4–2.0GB	General chat, reasoning
8GB	Qwen3.5 4B	2.8GB	Best balance for most users
12GB+	Qwen3.5 9B or Qwen3 8B	5.5–5.6GB	Complex tasks, nuanced answers
16GB+	Qwen3 14B (with speculative decoding)	9.7GB	Maximum quality, professional work

Not sure about your RAM? Go to Settings > About Phone and look for "Memory" or "RAM." If you're between tiers, pick the smaller model—better to run smoothly than stutter.

Step 3: Download Your Model

Tap your chosen model. TokForge will begin downloading—the progress bar shows how much is left. For a 4B model on WiFi, expect 3–8 minutes. The app will auto-load the model once the download finishes.

Downloading a model — progress, speed, and time remaining

Tip: Download on WiFi. Using mobile data will be slow and use your plan. Once downloaded, you never need internet again for that model.

Step 4: Start Chatting

Once the model loads (you'll see "Ready" next to its name), pick a character preset or start a blank chat. Type your first message and hit send. TokForge runs inference right on your phone—you'll see tokens generated in real-time.

Choose a character or start a blank chat

Your first conversation — running entirely on-device

First message compiles GPU kernels — follow-ups are instant

What to Expect

Speed

Generation speed depends on your phone:

Budget phones (4–6GB RAM): ~5 tokens per second
Mid-range (8GB): ~15–20 tokens per second
Flagship (12GB+): ~40–57 tokens per second with TurboQuant

The first message takes a few seconds for the model to "warm up" (this is called prefill). Follow-up messages are faster thanks to delta prefill optimization.

Quality

Smaller models (0.8B–2B) are snappy but basic. They're great for quick factual questions and creative writing. Larger models (8B+) understand nuance, can handle complex reasoning, and produce more polished output. For serious work, use at least a 4B or larger model.

Tips for Best Performance

Enable GPU acceleration. TokForge auto-detects if your GPU can help. If it finds one, the app will use it automatically—way faster than CPU only.
Run ForgeLab auto-tune. This finds the fastest settings for your specific phone. It's in Settings > Optimization.
Close background apps. Close browsers, games, or other heavy apps before lengthy conversations. More free RAM = faster generation.
Use airplane mode to save battery. WiFi and Bluetooth drain power faster than you'd think. Flip airplane mode on when you're in a deep conversation.
Keep your phone cool. Sustained inference generates heat. If your phone gets warm, take a break—the CPU will throttle otherwise.

Frequently Asked Questions

Does TokForge use the internet?

No. Once you download a model, TokForge runs completely offline. It doesn't call any servers, doesn't send data anywhere. The only time you need internet is to download the initial model file.

How good is the AI compared to ChatGPT?

It depends on the model size. A 0.8B or 2B model is like a helpful assistant—great for brainstorming and quick answers. A 4B–8B model (Qwen or Llama) is competitive with older versions of ChatGPT and very capable for most tasks. The 14B models approach GPT-3.5 level reasoning. We don't claim they match GPT-4, but for offline use, they're impressive.

Will running AI drain my battery?

Running inference uses your CPU or GPU, so yes, it uses more power than scrolling social media. A typical conversation drains battery similar to playing a game for the same duration. However, you're not constantly generating—there's idle time between messages. Enable airplane mode when chatting to save battery.

Can I use my own models?

Yes. TokForge supports importing GGUF and MNN format models. If you have a custom fine-tuned model or a model from Hugging Face, you can convert it and load it into TokForge. Visit the docs for step-by-step conversion instructions.

Is there a limit to how many models I can download?

No hard limit, but your storage is the constraint. Each model takes 0.6–10GB depending on size. Most phones have 64–256GB total storage, so you can realistically run 5–10 models at once.

What if my phone doesn't have 6GB RAM?

You can use the 0.8B model on 4GB phones—it's tiny and functional. The experience will be slower, but it works. If you have less than 4GB, offline models will struggle because the OS and other apps eat RAM.

Ready to Get Started?

Download TokForge from the Google Play Store and run your first offline AI conversation in under 10 minutes.

Get TokForge on Google Play

Next Steps

Once you're comfortable with TokForge, check out these guides:

Model Size Comparison Guide — Detailed breakdown of every model: speed, quality, memory usage.
AutoForge: Advanced Optimization — Squeeze every ounce of performance from your phone.
Character Cards Guide — Import TavernAI characters, customize personalities, and build persistent AI companions.

Offline AI isn't just faster and cheaper—it's freedom. No subscriptions, no tracking, no waiting for a server to respond. Your AI lives on your phone. Use it however you want, whenever you want.