API Reference

MetricsService API · 70+ endpoints · NanoHTTPD on configurable port

Overview

TokForge includes a built-in MetricsService — an HTTP API for remote device control, monitoring, and benchmarking. It runs as an embedded NanoHTTPD server accessible from any app or computer on the same network (or via ADB port forwarding).

This API enables AI agents, developers, and external tools to:

Discover device hardware capabilities
List, load, and unload inference models
Run inference tests and benchmarks
Manage conversations and character cards
Control app navigation and UI
Export/import configuration and benchmark results
Monitor performance metrics in real time

Setup

Enabling the API Server

The API server is disabled by default. To enable it:

Open TokForge → Settings → scroll to Advanced
Toggle "Metrics Server" → ON
Choose Bind Host:
- Localhost only (127.0.0.1) — accessible only from the device itself or via ADB port forwarding. Use this for development and testing.
- All interfaces (0.0.0.0) — accessible from any device on the local network (e.g., http://192.168.1.100:8088/health). Use this for remote control from another computer, AI agents, or Home Assistant.
Set Port (default: 8088)
Tap "Regenerate" next to the API Key field to generate a new auth token, or enter your own

Security note: When using "All interfaces", the API is accessible to anyone on your network. Always use a strong API key and avoid exposing the port to the internet.

Network Binding Behavior

Build Type	Localhost (`127.0.0.1`)	All Interfaces (`0.0.0.0`)
Debug	Auth bypassed (no token needed)	Auth required
Release	Auth required	Forced to `127.0.0.1` (security hardening)

In release builds, the server always binds to 127.0.0.1 regardless of the setting. To access the API from another device with a release build, use ADB port forwarding (see below).

Architecture

Server	NanoHTTPD (Java HTTP library)
Default port	8088 (configurable in Settings → Advanced)
Transport	HTTP/1.1
Auth	Bearer token (configured in Settings → Advanced → API Key)
Encoding	JSON (application/json)
Response Format	JSON object with `status`, `data`, or `error` fields

Authentication

Getting Your API Key

From the app (recommended):

Open Settings → Advanced → Metrics Server section
Your API key is displayed in the "Metrics Auth Token" field
Tap "Regenerate" to create a new key at any time
Copy the token and use it in your API requests

Programmatically (debug builds only):

# Token is logged at startup (debug builds only, truncated for security)
adb logcat -s MetricsServer | grep "Auth token"

# Or retrieve via DebugReceiver (debug builds only)
adb shell am broadcast -a dev.tokforge.DEBUG_ACTION \
    -n dev.tokforge/.debug.DebugReceiver \
    --es command get_auth_token
adb shell run-as dev.tokforge cat files/.auth_token

Via the API itself:

# Rotate the token and receive the new one in the response
curl -X POST -H "Authorization: Bearer CURRENT_TOKEN" \
     http://localhost:8088/control/rotate-auth-token
# Response: {"status":"ok", "auth_token":"NEW_TOKEN_HERE", "token_rotated":true}

Using the Token

All POST, DELETE, and most PUT endpoints require authentication. A few read-only endpoints (/health, /version, /metrics, /state/hardware) are accessible without auth.

Include the token in the Authorization header:

curl -H "Authorization: Bearer YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     -X POST http://DEVICE_IP:8088/control/load-model \
     -d '{"model_id": 1}'

Missing or invalid tokens return 401 Unauthorized:

{
  "error": "Unauthorized — provide Bearer token in Authorization header"
}

Unauthenticated Endpoints

These endpoints are accessible without a token (useful for health checks and monitoring):

Endpoint	Description
`GET /health`	Server status, loaded models, uptime
`GET /version`	App version, build type, device info
`GET /metrics`	Model load state
`GET /state/hardware`	SoC, RAM, GPU, recommended config

Connecting to the API

Same Network (All Interfaces mode)

If the server is bound to 0.0.0.0 (debug builds), connect directly using the device's IP:

# Find device IP in Settings → About Phone → Status → IP Address
curl http://192.168.1.100:8088/health

ADB Port Forwarding (Localhost mode or Release builds)

Forward the device port to your computer:

# Single device
adb forward tcp:8088 tcp:8088
curl http://localhost:8088/health

# Multiple devices — use unique local ports
adb -s DEVICE1_SERIAL forward tcp:8088 tcp:8088
adb -s DEVICE2_SERIAL forward tcp:8089 tcp:8088
adb -s DEVICE3_SERIAL forward tcp:8090 tcp:8088

On-Device Access

From the device itself (e.g., Termux, or another app):

curl http://127.0.0.1:8088/health

Use a unique local port per active test script/process. Reusing the same local port (for example two scripts both using tcp:8088) will overwrite forwarding and cause intermittent control-plane disconnects.

1. Health & Status

GET /health No Auth

Check server status and which inference backends are available.

GET /metrics No Auth

Minimal endpoint returning only loaded model status.

GET /version No Auth

App version, build info, package name, device, and Android version.

GET /storage No Auth

Storage info including model directory size, available space, and list of model files.

2. State Endpoints

GET /state No Auth

Complete device and app state snapshot. Combines settings, inference, hardware, and performance data.

GET /state/settings No Auth

User settings: theme, system prompt, remote API config, persona settings, saved endpoints.

GET /state/inference No Auth

Current inference configuration: loaded models, backend availability, default model, system prompt.

GET /state/hardware No Auth

Device hardware profile: SoC, cores, GPU, RAM, and recommended optimal configuration.

GET /performance No Auth

Real-time performance metrics: memory, battery, thermal status, last 10 generation stats.

3. Model Endpoints

GET /models No Auth

List all downloaded models with metadata: name, path, size, quantization, parameter count, type.

GET /models/{id} No Auth

Get detailed information about a specific model by database ID.

POST /control/load-model Auth Required

Load a model from disk. Unloads any currently loaded model. Auto-loads draft model for speculative decoding.

POST /control/unload-all Auth Required

Unload all currently loaded models (both MNN and GGUF).

POST /control/scan-models Auth Required

Scan models directory and register any new models found.

POST /control/download-model Auth Required

Download a model from a URL. Runs in background; monitor with /control/download-status.

GET /control/download-status No Auth

Get progress of ongoing model download: bytes, percentage, speed, estimated time remaining.

POST /control/delete-model Auth Required

Delete a model file from disk and remove from database.

POST /control/rotate-auth-token Auth Required

Rotate authentication token. Returns new token in response.

POST /control/cancel-download Auth Required

Cancel an ongoing model download.

4. Configuration

GET /config No Auth

Get current inference configuration: threads, context, batch size, sampling params, KV cache, GPU layers.

POST /control/update-config Auth Required

Update sampling parameters: temperature, top-p, top-k, max tokens, repeat penalty.

POST /control/set-settings Auth Required

Update app settings: theme, system prompt, metrics server config, remote API endpoints, persona settings.

POST /control/set-inference-config Auth Required

Set advanced inference config: threading, context, KV cache, flash attention, GPU layers, draft model.

POST /control/switch-backend Auth Required

Switch inference backend: mnn, gguf, or remote.

POST /control/reload-model Auth Required

Reload the currently loaded model (apply config changes).

POST /control/qnn-probe Auth Required

Probe QNN capabilities synchronously. Returns supported data types, layer operations, and performance characteristics.

POST /control/qnn-probe-async Auth Required

Probe QNN capabilities asynchronously. Returns job ID; poll /control/qnn-probe-status to check progress.

GET /control/qnn-probe-status No Auth

Get status and results of async QNN probe operation.

GET /control/qnn-hybrid-capabilities No Auth

Get hybrid inference capabilities: supported mixed-precision configs and fallback rules.

5. Inference & Generation

POST /test-prompt Auth Required

Generate text from a single prompt without creating a conversation. Returns tokens, speed, and output.

GET /control/generation-status No Auth

Get current generation progress: tokens generated, speed, elapsed time, estimated remaining.

POST /control/stop-generation Auth Required

Stop an ongoing generation.

POST /control/auto-tune Auth Required

Run auto-tuning: sweep thread counts and configs, recommend optimal settings for current hardware + model.

POST /control/auto-tune/cancel Auth Required

Cancel an ongoing auto-tune operation.

POST /control/scenario/run Auth Required

Run a pre-defined or custom inference scenario. Returns scenario results and metrics.

GET /control/scenario/status No Auth

Get status and progress of an ongoing scenario run.

6. Conversations

GET /conversations No Auth

List all conversations with titles, timestamps, backend types, and message counts.

GET /conversations/{id}/messages No Auth

Get all messages in a conversation.

POST /conversations/{id}/send Auth Required

Send a message to a conversation and receive the AI response.

POST /control/create-conversation Auth Required

Create a new conversation with optional title, backend type, and character.

DELETE /conversations/{id} Auth Required

Delete a conversation and all its messages.

7. Characters

GET /characters No Auth

List all character cards with names, descriptions, personalities, and greetings.

GET /characters/{id}/export No Auth

Export a character card as a downloadable file.

POST /characters/import Auth Required

Import a character card from an uploaded file (multipart).

DELETE /characters/{id} Auth Required

Delete a character card.

8. Benchmarks

POST /benchmark/run Auth Required

Run a single benchmark: load model, measure inference speed, save result. Configurable prompt, tokens, runs.

GET /benchmark/results No Auth

Query benchmark results with filtering by SoC, model, backend. Supports pagination.

GET /benchmark/matrix No Auth

Get results organized as a SoC × Model × Backend matrix for cross-device comparison.

POST /benchmark/auto-matrix Auth Required

Run benchmarks across all model/backend combinations. Async — poll /benchmark/auto-matrix/status.

GET /benchmark/auto-matrix/status No Auth

Get progress of ongoing auto-matrix benchmark.

GET /benchmark/export No Auth

Export all benchmark results as JSON for cross-device analysis.

POST /benchmark/import Auth Required

Import benchmark results from another device (multipart JSON upload).

GET /benchmark/optimal-config No Auth

Get ML-powered optimal config recommendation for a given SoC + model + backend combination.

DELETE /benchmark/clear Auth Required

Delete all benchmark results and profiles.

9. Debug

GET /debug/export No Auth

Export full app state as a ZIP: models, conversations, settings, logs.

GET /debug/errors No Auth

Get crash logs and error history.

GET /debug/log No Auth

Get internal app logs. Query params: lines (max 2000), filter tag, since (timestamp).

GET /debug/logcat No Auth

Get logcat output filtered by tag. Query params: lines (max 2000), filter tag.

GET /debug/live-telemetry No Auth

Stream live performance telemetry: memory, thermal, power draw. Returns Server-Sent Events (SSE) stream.

DELETE /debug/live-telemetry Auth Required

Stop active live-telemetry stream.

GET /debug/hybrid-diagnostics No Auth

Get diagnostic data for hybrid inference: layer assignments, fallback events, performance attribution.

GET /debug/mnn-trace No Auth

Get MNN execution trace: kernel timing, memory allocation, layer profiling.

DELETE /debug/mnn-trace Auth Required

Clear collected MNN trace data.

POST /debug/crash-test Auth Required

Intentionally trigger a crash for testing error reporting and recovery.

GET /debug/config No Auth

Get full debug configuration and runtime settings.

10. UI Control

POST /control/navigate Auth Required

Navigate to a specific screen: home, chat, models, settings, characters, benchmarks.

POST /control/toast Auth Required

Show a toast notification on the device screen.

Workflow Examples

7-Step AI Agent Workflow

A complete example of using the TokForge API to build an autonomous AI agent:

Step 1: Verify Device Readiness

curl -s http://localhost:8088/health | jq '.mnn_loaded'
# Output: true (device ready)

Step 2: List Available Models

curl -s -H "Authorization: Bearer TOKEN" \
     http://localhost:8088/models | jq '.data[] | .name'

Step 3: Load a Model

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"model_id": 3}' \
     http://localhost:8088/control/load-model

Step 4: Create a Conversation

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"title": "Agent Session", "backend": "mnn"}' \
     http://localhost:8088/control/create-conversation
# Returns: {"status":"ok", "id": "conv_12345"}

Step 5: Send Message and Get Response

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"message": "What is 2+2?"}' \
     http://localhost:8088/conversations/conv_12345/send
# Returns: {"status":"ok", "response": "2+2 equals 4."}

Step 6: Monitor Performance

curl -s http://localhost:8088/performance | jq '.last_generation_stats'
# Shows: tokens/sec, memory usage, thermal status

Step 7: Run Benchmark and Export

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Benchmark prompt", "num_tokens": 100}' \
     http://localhost:8088/benchmark/run

# Export results
curl -s http://localhost:8088/benchmark/export > results.json

Error Handling

Code	Meaning
`200`	Success
`400`	Bad Request (invalid parameters)
`401`	Unauthorized (missing or invalid auth token)
`404`	Not Found (resource doesn't exist)
`408`	Timeout (operation took too long)
`409`	Conflict (another operation already running)
`500`	Internal Server Error

Rate Limits

The API applies per-token rate limits to prevent abuse:

Endpoint Type	Limit	Window
Read (GET, unauthenticated)	100 requests	1 minute
Read (GET, authenticated)	200 requests	1 minute
Write (POST, DELETE)	50 requests	1 minute
Benchmark operations	5 concurrent	Continuous
Model downloads	1 concurrent	Continuous

Hitting the limit returns 429 Too Many Requests. Retry-After header indicates seconds to wait.

Security

Localhost only by default: Server binds to 127.0.0.1 in release builds, not accessible over network without ADB
Auth tokens: Generated per-session, configured in Settings. Treat as secrets. Rotate regularly with /control/rotate-auth-token
No HTTPS: Uses plain HTTP over ADB tunnel. ADB provides the security boundary.
Debug mode only: API is only available in debug builds (BuildConfig.DEBUG) or when explicitly enabled.
Network binding: Release builds force localhost binding regardless of settings. Use ADB port forwarding for remote access.
CORS: Not applicable (native Android app, no browser context)
Input validation: All parameters validated server-side. Untrusted input sanitized before use.