Question 1

What is TokForge?

Accepted Answer

TokForge is a private AI chat app for Android that runs large language models (LLMs) entirely on your device. It supports character cards, thinking/reasoning mode, dual inference backends, and built-in benchmarking — all without requiring an internet connection.

Question 2

Does TokForge need internet access?

Accepted Answer

No. Once you've downloaded a model, TokForge works 100% offline. Internet is only needed to download models from Hugging Face and optionally to check for app updates.

Question 3

Is my data sent anywhere?

Accepted Answer

Never. TokForge has zero telemetry, zero analytics, and zero cloud dependency. Your conversations, characters, and settings stay on your device.

Question 4

Is TokForge free?

Accepted Answer

Yes, TokForge is completely free during the beta period. There are no in-app purchases, subscriptions, or ads.

Question 5

What phones can run TokForge?

Accepted Answer

TokForge runs on any ARM64 Android device running Android 8.0+. For a good experience with larger models, you'll want at least 8GB RAM and a modern SoC.

Question 6

How much RAM do I need?

Accepted Answer

4GB RAM runs Qwen3-0.6B. 8GB RAM runs Qwen3-4B at 13-20 tok/s. 12GB+ RAM runs Qwen3-8B at 10-14 tok/s. 16-24GB RAM can run Qwen3-14B at ~8 tok/s.

Question 7

How much storage do models need?

Accepted Answer

Qwen3-0.6B is ~350MB. Qwen3-4B is ~2.3GB. Qwen3-8B is ~4.8GB. Qwen3-14B is ~8.5GB. TokForge itself is under 50MB.

Question 8

Does it work on tablets?

Accepted Answer

Yes. TokForge has a responsive layout that adapts to tablet screens. Tested on Xiaomi Pad 7 Pro and Lenovo Tab Plus.

Question 9

What's the difference between MNN and GGUF?

Accepted Answer

MNN uses OpenCL GPU for up to 2.4x faster inference on models 1.7B+. GGUF uses optimized CPU inference and is required for thinking/reasoning mode.

Question 10

Can I use my own models?

Accepted Answer

Yes. TokForge includes a built-in Hugging Face model search. You can find and download any compatible GGUF model directly from the app.

Question 11

What is thinking/reasoning mode?

Accepted Answer

Some models can show chain-of-thought reasoning inside collapsible think blocks. TokForge renders these live during generation.

Question 12

What does tok/s mean?

Accepted Answer

Tokens per second measures generation speed. A token is roughly 3/4 of a word. Above 5-8 tok/s is comfortable reading speed. Above 15 tok/s feels instant.

Question 13

What is ForgeLab?

Accepted Answer

ForgeLab is TokForge's built-in benchmarking suite. It measures decode tok/s, prefill latency, and throughput. Auto-Matrix tests all model/backend combinations automatically.

Question 14

How do I get access?

Accepted Answer

TokForge is in closed beta on the Google Play Store. Request access on the homepage via the beta signup form.

Frequently Asked Questions

General

What is TokForge?

Does TokForge need internet access?

Is my data sent anywhere?

Is TokForge free?

Device Requirements

What phones can run TokForge?

How much RAM do I need?

How much storage do models need?

Does it work on tablets?

Models & Inference

What's the difference between MNN and GGUF?

Can I use my own models?

What is thinking/reasoning mode?

Character Cards

What character card format does TokForge support?

Does it come with built-in characters?

Benchmarking

What does "tok/s" mean?

What is ForgeLab?

Beta & Access

How do I get access?

When will TokForge be publicly available?

Still have questions?