API Reference
Overview
TokForge includes a built-in MetricsService — an HTTP API for remote device control, monitoring, and benchmarking. It runs as an embedded NanoHTTPD server accessible from any app or computer on the same network (or via ADB port forwarding).
This API enables AI agents, developers, and external tools to:
- Discover device hardware capabilities
- List, load, and unload inference models
- Run inference tests and benchmarks
- Manage conversations and character cards
- Control app navigation and UI
- Export/import configuration and benchmark results
- Monitor performance metrics in real time
Setup
Enabling the API Server
The API server is disabled by default. To enable it:
- Open TokForge → Settings → scroll to Advanced
- Toggle "Metrics Server" → ON
- Choose Bind Host:
- Localhost only (
127.0.0.1) — accessible only from the device itself or via ADB port forwarding. Use this for development and testing. - All interfaces (
0.0.0.0) — accessible from any device on the local network (e.g.,http://192.168.1.100:8088/health). Use this for remote control from another computer, AI agents, or Home Assistant.
- Localhost only (
- Set Port (default:
8088) - Tap "Regenerate" next to the API Key field to generate a new auth token, or enter your own
Security note: When using "All interfaces", the API is accessible to anyone on your network. Always use a strong API key and avoid exposing the port to the internet.
Network Binding Behavior
| Build Type | Localhost (127.0.0.1) | All Interfaces (0.0.0.0) |
|---|---|---|
| Debug | Auth bypassed (no token needed) | Auth required |
| Release | Auth required | Forced to 127.0.0.1 (security hardening) |
In release builds, the server always binds to 127.0.0.1 regardless of the setting. To access the API from another device with a release build, use ADB port forwarding (see below).
Architecture
| Server | NanoHTTPD (Java HTTP library) |
| Default port | 8088 (configurable in Settings → Advanced) |
| Transport | HTTP/1.1 |
| Auth | Bearer token (configured in Settings → Advanced → API Key) |
| Encoding | JSON (application/json) |
| Response Format | JSON object with status, data, or error fields |
Authentication
Getting Your API Key
From the app (recommended):
- Open Settings → Advanced → Metrics Server section
- Your API key is displayed in the "Metrics Auth Token" field
- Tap "Regenerate" to create a new key at any time
- Copy the token and use it in your API requests
Programmatically (debug builds only):
# Token is logged at startup (debug builds only, truncated for security)
adb logcat -s MetricsServer | grep "Auth token"
# Or retrieve via DebugReceiver (debug builds only)
adb shell am broadcast -a dev.tokforge.DEBUG_ACTION \
-n dev.tokforge/.debug.DebugReceiver \
--es command get_auth_token
adb shell run-as dev.tokforge cat files/.auth_token
Via the API itself:
# Rotate the token and receive the new one in the response
curl -X POST -H "Authorization: Bearer CURRENT_TOKEN" \
http://localhost:8088/control/rotate-auth-token
# Response: {"status":"ok", "auth_token":"NEW_TOKEN_HERE", "token_rotated":true}
Using the Token
All POST, DELETE, and most PUT endpoints require authentication. A few read-only endpoints (/health, /version, /metrics, /state/hardware) are accessible without auth.
Include the token in the Authorization header:
curl -H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-X POST http://DEVICE_IP:8088/control/load-model \
-d '{"model_id": 1}'
Missing or invalid tokens return 401 Unauthorized:
{
"error": "Unauthorized — provide Bearer token in Authorization header"
}
Unauthenticated Endpoints
These endpoints are accessible without a token (useful for health checks and monitoring):
| Endpoint | Description |
|---|---|
GET /health | Server status, loaded models, uptime |
GET /version | App version, build type, device info |
GET /metrics | Model load state |
GET /state/hardware | SoC, RAM, GPU, recommended config |
Connecting to the API
Same Network (All Interfaces mode)
If the server is bound to 0.0.0.0 (debug builds), connect directly using the device's IP:
# Find device IP in Settings → About Phone → Status → IP Address
curl http://192.168.1.100:8088/health
ADB Port Forwarding (Localhost mode or Release builds)
Forward the device port to your computer:
# Single device
adb forward tcp:8088 tcp:8088
curl http://localhost:8088/health
# Multiple devices — use unique local ports
adb -s DEVICE1_SERIAL forward tcp:8088 tcp:8088
adb -s DEVICE2_SERIAL forward tcp:8089 tcp:8088
adb -s DEVICE3_SERIAL forward tcp:8090 tcp:8088
On-Device Access
From the device itself (e.g., Termux, or another app):
curl http://127.0.0.1:8088/health
Use a unique local port per active test script/process. Reusing the same local port (for example two scripts both using tcp:8088) will overwrite forwarding and cause intermittent control-plane disconnects.
1. Health & Status
Check server status and which inference backends are available.
Minimal endpoint returning only loaded model status.
App version, build info, package name, device, and Android version.
Storage info including model directory size, available space, and list of model files.
2. State Endpoints
Complete device and app state snapshot. Combines settings, inference, hardware, and performance data.
User settings: theme, system prompt, remote API config, persona settings, saved endpoints.
Current inference configuration: loaded models, backend availability, default model, system prompt.
Device hardware profile: SoC, cores, GPU, RAM, and recommended optimal configuration.
Real-time performance metrics: memory, battery, thermal status, last 10 generation stats.
3. Model Endpoints
List all downloaded models with metadata: name, path, size, quantization, parameter count, type.
Get detailed information about a specific model by database ID.
Load a model from disk. Unloads any currently loaded model. Auto-loads draft model for speculative decoding.
Unload all currently loaded models (both MNN and GGUF).
Scan models directory and register any new models found.
Download a model from a URL. Runs in background; monitor with /control/download-status.
Get progress of ongoing model download: bytes, percentage, speed, estimated time remaining.
Delete a model file from disk and remove from database.
Rotate authentication token. Returns new token in response.
Cancel an ongoing model download.
4. Configuration
Get current inference configuration: threads, context, batch size, sampling params, KV cache, GPU layers.
Update sampling parameters: temperature, top-p, top-k, max tokens, repeat penalty.
Update app settings: theme, system prompt, metrics server config, remote API endpoints, persona settings.
Set advanced inference config: threading, context, KV cache, flash attention, GPU layers, draft model.
Switch inference backend: mnn, gguf, or remote.
Reload the currently loaded model (apply config changes).
Probe QNN capabilities synchronously. Returns supported data types, layer operations, and performance characteristics.
Probe QNN capabilities asynchronously. Returns job ID; poll /control/qnn-probe-status to check progress.
Get status and results of async QNN probe operation.
Get hybrid inference capabilities: supported mixed-precision configs and fallback rules.
5. Inference & Generation
Generate text from a single prompt without creating a conversation. Returns tokens, speed, and output.
Get current generation progress: tokens generated, speed, elapsed time, estimated remaining.
Stop an ongoing generation.
Run auto-tuning: sweep thread counts and configs, recommend optimal settings for current hardware + model.
Cancel an ongoing auto-tune operation.
Run a pre-defined or custom inference scenario. Returns scenario results and metrics.
Get status and progress of an ongoing scenario run.
6. Conversations
List all conversations with titles, timestamps, backend types, and message counts.
Get all messages in a conversation.
Send a message to a conversation and receive the AI response.
Create a new conversation with optional title, backend type, and character.
Delete a conversation and all its messages.
7. Characters
List all character cards with names, descriptions, personalities, and greetings.
Export a character card as a downloadable file.
Import a character card from an uploaded file (multipart).
Delete a character card.
8. Benchmarks
Run a single benchmark: load model, measure inference speed, save result. Configurable prompt, tokens, runs.
Query benchmark results with filtering by SoC, model, backend. Supports pagination.
Get results organized as a SoC × Model × Backend matrix for cross-device comparison.
Run benchmarks across all model/backend combinations. Async — poll /benchmark/auto-matrix/status.
Get progress of ongoing auto-matrix benchmark.
Export all benchmark results as JSON for cross-device analysis.
Import benchmark results from another device (multipart JSON upload).
Get ML-powered optimal config recommendation for a given SoC + model + backend combination.
Delete all benchmark results and profiles.
9. Debug
Export full app state as a ZIP: models, conversations, settings, logs.
Get crash logs and error history.
Get internal app logs. Query params: lines (max 2000), filter tag, since (timestamp).
Get logcat output filtered by tag. Query params: lines (max 2000), filter tag.
Stream live performance telemetry: memory, thermal, power draw. Returns Server-Sent Events (SSE) stream.
Stop active live-telemetry stream.
Get diagnostic data for hybrid inference: layer assignments, fallback events, performance attribution.
Get MNN execution trace: kernel timing, memory allocation, layer profiling.
Clear collected MNN trace data.
Intentionally trigger a crash for testing error reporting and recovery.
Get full debug configuration and runtime settings.
10. UI Control
Navigate to a specific screen: home, chat, models, settings, characters, benchmarks.
Show a toast notification on the device screen.
Workflow Examples
7-Step AI Agent Workflow
A complete example of using the TokForge API to build an autonomous AI agent:
Step 1: Verify Device Readiness
curl -s http://localhost:8088/health | jq '.mnn_loaded'
# Output: true (device ready)
Step 2: List Available Models
curl -s -H "Authorization: Bearer TOKEN" \
http://localhost:8088/models | jq '.data[] | .name'
Step 3: Load a Model
curl -X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"model_id": 3}' \
http://localhost:8088/control/load-model
Step 4: Create a Conversation
curl -X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"title": "Agent Session", "backend": "mnn"}' \
http://localhost:8088/control/create-conversation
# Returns: {"status":"ok", "id": "conv_12345"}
Step 5: Send Message and Get Response
curl -X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"message": "What is 2+2?"}' \
http://localhost:8088/conversations/conv_12345/send
# Returns: {"status":"ok", "response": "2+2 equals 4."}
Step 6: Monitor Performance
curl -s http://localhost:8088/performance | jq '.last_generation_stats'
# Shows: tokens/sec, memory usage, thermal status
Step 7: Run Benchmark and Export
curl -X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt": "Benchmark prompt", "num_tokens": 100}' \
http://localhost:8088/benchmark/run
# Export results
curl -s http://localhost:8088/benchmark/export > results.json
Error Handling
| Code | Meaning |
|---|---|
200 | Success |
400 | Bad Request (invalid parameters) |
401 | Unauthorized (missing or invalid auth token) |
404 | Not Found (resource doesn't exist) |
408 | Timeout (operation took too long) |
409 | Conflict (another operation already running) |
500 | Internal Server Error |
Rate Limits
The API applies per-token rate limits to prevent abuse:
| Endpoint Type | Limit | Window |
|---|---|---|
| Read (GET, unauthenticated) | 100 requests | 1 minute |
| Read (GET, authenticated) | 200 requests | 1 minute |
| Write (POST, DELETE) | 50 requests | 1 minute |
| Benchmark operations | 5 concurrent | Continuous |
| Model downloads | 1 concurrent | Continuous |
Hitting the limit returns 429 Too Many Requests. Retry-After header indicates seconds to wait.
Security
- Localhost only by default: Server binds to 127.0.0.1 in release builds, not accessible over network without ADB
- Auth tokens: Generated per-session, configured in Settings. Treat as secrets. Rotate regularly with /control/rotate-auth-token
- No HTTPS: Uses plain HTTP over ADB tunnel. ADB provides the security boundary.
- Debug mode only: API is only available in debug builds (
BuildConfig.DEBUG) or when explicitly enabled. - Network binding: Release builds force localhost binding regardless of settings. Use ADB port forwarding for remote access.
- CORS: Not applicable (native Android app, no browser context)
- Input validation: All parameters validated server-side. Untrusted input sanitized before use.