Free Beta
Home / Docs / Quickstart

Quickstart

Get TokForge v3.4.7 running on your Android device in 5 minutes.

1

Welcome & Setup Options

TokForge welcome screen

Download TokForge from the Play Store (once available) or install the APK directly from the beta program.

# If you have the APK from the beta program:
adb install tokforge-beta.apk

On first launch, you'll see the welcome screen. Choose your preferences:

  • Memory Mode: Select normal or extended memory (enables background memory extraction for continuity across conversations)
  • Thinking Toggle: Enable or disable thinking/reasoning mode support

Tap Get Started to begin onboarding.

2

Model Recommendation

TokForge model recommendation

TokForge automatically profiles your device — detecting your SoC, RAM, GPU capabilities, and backend support. Based on your device tier, we recommend a model optimized for your hardware:

  • 4GB RAM: Qwen3-0.6B-MNN (~0.35GB) — blazing fast
  • 6-8GB RAM: Qwen3-1.7B-MNN (~1.0GB) or Qwen3.5-4B-MNN (~2.3GB)
  • 12GB RAM: Qwen3-8B-MNN (~4.8GB) or Qwen3.5-9B-MNN (~5.4GB)
  • 16GB+ RAM: Qwen3-14B-MNN (~8.5GB) for maximum quality

The app automatically selects the best backend: MNN (GPU-accelerated on OpenCL/Vulkan) or GGUF (CPU fallback). No manual configuration needed.

3

Download Model

TokForge model download with progress

TokForge downloads models directly from Hugging Face with intelligent retry and real-time progress tracking.

You'll see:

  • Live download speed and ETA
  • Auto-retry with exponential backoff on network hiccups
  • Disk space checks to prevent corruption

Once complete, the model is cached locally and ready to use. You can download additional models anytime from the Models tab.

4

Acceleration Pack (Optional)

Speculative decoding settings with draft model pairing, predict length slider, and acceptance rates

TokForge offers an optional speculative decoding acceleration pack — a lightweight draft model that runs alongside your main model to speed up generation.

On supported devices, speculative decoding can boost throughput by 20-40% with no quality loss. The draft model (typically 0.6B) is downloaded separately and toggled in Settings.

You can skip this and enable it later in Settings → Advanced.

5

Ready to Chat

TokForge live chat

You're all set! Open a chat and start conversing. TokForge provides:

  • Streaming tokens — watch the model generate in real time with live tok/s counter
  • Kokoro TTS — high-quality offline text-to-speech with 11 voices (Settings → Voice)
  • Per-conversation settings — adjust temperature, sampling, and other parameters per character
  • Markdown rendering and code syntax highlighting
  • Thinking mode — collapsible <think> blocks show reasoning in real time
  • Background memory — automatically extracts and stores facts across conversations for continuity

All conversations run 100% on-device. Nothing leaves your phone.

Key Features

System Requirements

Troubleshooting

Download timeouts? TokForge auto-retries with exponential backoff. Network hiccups are handled gracefully.

Slow generation? Check Settings → Advanced for backend selection. MNN uses GPU when available; GGUF falls back to CPU.

Custom characters not loading? Ensure your JSON or PNG card follows TavernAI V2 format. Invalid schemas are skipped silently.

What's Next?

Tip: TokForge is in beta. Your benchmark data helps us build the most comprehensive mobile LLM performance database. Join our Discord to share results and feedback.