Frequently Asked Questions
General
What is TokForge?
TokForge is a private AI chat app for Android that runs large language models (LLMs) entirely on your device. It supports character cards, thinking/reasoning mode, dual inference backends, and built-in benchmarking — all without requiring an internet connection.
Does TokForge need internet access?
No. Once you've downloaded a model, TokForge works 100% offline. Internet is only needed to download models from Hugging Face and (optionally) to check for app updates. All AI inference runs locally on your phone's CPU or GPU.
Is my data sent anywhere?
Never. TokForge has zero telemetry, zero analytics, and zero cloud dependency. Your conversations, characters, and settings stay on your device. The optional benchmark sharing feature is entirely opt-in and only shares device specs + performance numbers — never conversation data.
Is TokForge free?
Yes, TokForge is completely free during the beta period. There are no in-app purchases, subscriptions, or ads. Pricing after beta has not been determined yet.
Device Requirements
What phones can run TokForge?
TokForge runs on any ARM64 Android device running Android 8.0+. That covers the vast majority of phones from the last 6+ years. For a good experience with larger models, you'll want a device with a modern SoC and at least 8GB RAM.
We've tested on Samsung Galaxy S20 through S26, RedMagic 11 Pro, OnePlus Ace 5, Xiaomi Pad 7 Pro, and Lenovo Tab Plus — with more devices being added during the beta.
How much RAM do I need?
It depends on the model size:
4GB RAM — Qwen3-0.6B runs great (~35+ tok/s)
8GB RAM — Qwen3-4B runs well (~13-20 tok/s depending on GPU)
12GB+ RAM — Qwen3-8B sweet spot (~10-14 tok/s)
16-24GB RAM — Qwen3-14B is possible (~8 tok/s on flagship devices)
TokForge auto-detects your RAM and recommends the right model size during setup.
How much storage do models need?
Model sizes vary by quantization:
Qwen3-0.6B — ~350MB
Qwen3-4B (Q4_K_M) — ~2.3GB
Qwen3-8B (Q4_K_M) — ~4.8GB
Qwen3-14B (Q4_K_M) — ~8.5GB
You can download and delete models freely. TokForge itself is under 50MB.
Does it work on tablets?
Yes! TokForge has a responsive layout that adapts to tablet screens. We've tested on the Xiaomi Pad 7 Pro and Lenovo Tab Plus. Tablets with Snapdragon SoCs get the same GPU acceleration benefits as phones.
Models & Inference
What's the difference between MNN and GGUF?
TokForge has two inference backends:
MNN (OpenCL GPU) — Uses your phone's GPU for inference. Up to 2.4x faster than CPU for models 1.7B+. Best for standard chat and roleplay.
GGUF (llama.cpp CPU) — Uses optimized CPU inference with ARM i8mm acceleration. Required for thinking/reasoning mode (<think> blocks). Wins at very small model sizes (0.6B).
TokForge auto-selects the best backend for your device and model combination.
Can I use my own models?
Yes. TokForge includes a built-in Hugging Face model search, so you can find and download any compatible GGUF model directly from the app. There are also 17 curated models across General, Roleplay, Creative, and Thinking categories for easy discovery.
What is thinking/reasoning mode?
Some models (like Qwen3, DeepSeek-R1, and QwQ) can "think out loud" before answering, showing their chain-of-thought reasoning inside <think> blocks. TokForge renders these as collapsible sections so you can see how the model arrived at its answer. This currently requires the GGUF backend.
Character Cards
What character card format does TokForge support?
TokForge supports the TavernAI V2 character card specification. You can import cards as PNG files (with embedded metadata) or JSON. Features include lorebooks, alternate greetings, world info, {{char}}/{{user}} placeholders, and system prompt assembly.
Does it come with built-in characters?
Yes. TokForge ships with four built-in personas: Rex (direct and analytical), Luna (creative and warm), Marcus (scholarly and thorough), and Aria (artistic and expressive). You can use these as-is or as templates for creating your own characters.
Benchmarking
What does "tok/s" mean?
Tokens per second (tok/s) measures how fast the model generates text. A token is roughly 3/4 of a word. For comfortable reading speed, you want at least 5-8 tok/s. Above 15 tok/s feels nearly instant. TokForge shows tok/s in real-time during conversations and in ForgeLab benchmarks.
What is ForgeLab?
ForgeLab is TokForge's built-in benchmarking suite. It measures decode tok/s, prefill latency, and throughput across every model and backend on your device. The Auto-Matrix feature tests all combinations automatically and saves results in a persistent local database. You can export results and share them with the community.
Beta & Access
How do I get access?
TokForge is currently in closed beta testing on the Google Play Store. You can request access on the homepage by filling out the beta signup form. We prioritize testers with diverse devices to help build our cross-device benchmark database.
When will TokForge be publicly available?
We're working toward a public release on the Google Play Store. The timeline depends on feedback from the closed beta. Join the beta to help us get there faster and shape the final product.