Free Beta
Home / Docs / API Reference

API Reference

MetricsService API v3.4.7 · 120+ endpoints across 11 handler groups · NanoHTTPD on configurable port

Overview

TokForge includes a built-in MetricsService — an HTTP API for remote device control, monitoring, and benchmarking. It runs as an embedded NanoHTTPD server accessible from any app or computer on the same network (or via ADB port forwarding).

This API enables AI agents, developers, and external tools to:

  • Discover device hardware capabilities
  • List, load, and unload inference models
  • Run inference tests and benchmarks
  • Manage conversations and character cards
  • Control app navigation and UI
  • Export/import configuration and benchmark results
  • Monitor performance metrics in real time

Setup

Enabling the API Server

The API server is disabled by default. To enable it:

  1. Open TokForge → Settings → scroll to Advanced
  2. Toggle "Metrics Server" → ON
  3. Choose Bind Host:
    • Localhost only (127.0.0.1) — accessible only from the device itself or via ADB port forwarding. Use this for development and testing.
    • All interfaces (0.0.0.0) — accessible from any device on the local network (e.g., http://192.168.1.100:8088/health). Use this for remote control from another computer, AI agents, or Home Assistant.
  4. Set Port (default: 8088)
  5. Tap "Regenerate" next to the API Key field to generate a new auth token, or enter your own
Security note: When using "All interfaces", the API is accessible to anyone on your network. Always use a strong API key and avoid exposing the port to the internet.

Network Binding Behavior

Build TypeLocalhost (127.0.0.1)All Interfaces (0.0.0.0)
DebugAuth bypassed (no token needed)Auth required
ReleaseAuth requiredForced to 127.0.0.1 (security hardening)

In release builds, the server always binds to 127.0.0.1 regardless of the setting. To access the API from another device with a release build, use ADB port forwarding (see below).

Architecture

ServerNanoHTTPD (Java HTTP library)
Default port8088 (configurable in Settings → Advanced)
TransportHTTP/1.1
AuthBearer token (configured in Settings → Advanced → API Key)
EncodingJSON (application/json)
Response FormatJSON object with status, data, or error fields

Authentication

Getting Your API Key

From the app (recommended):

  1. Open SettingsAdvancedMetrics Server section
  2. Your API key is displayed in the "Metrics Auth Token" field
  3. Tap "Regenerate" to create a new key at any time
  4. Copy the token and use it in your API requests

Programmatically (debug builds only):

# Token is logged at startup (debug builds only, truncated for security)
adb logcat -s MetricsServer | grep "Auth token"

# Or retrieve via DebugReceiver (debug builds only)
adb shell am broadcast -a dev.tokforge.DEBUG_ACTION \
    -n dev.tokforge/.debug.DebugReceiver \
    --es command get_auth_token
adb shell run-as dev.tokforge cat files/.auth_token

Via the API itself:

# Rotate the token and receive the new one in the response
curl -X POST -H "Authorization: Bearer CURRENT_TOKEN" \
     http://localhost:8088/control/rotate-auth-token
# Response: {"status":"ok", "auth_token":"NEW_TOKEN_HERE", "token_rotated":true}

Using the Token

All POST, DELETE, and most PUT endpoints require authentication. A few read-only endpoints (/health, /version, /metrics, /state/hardware) are accessible without auth.

Include the token in the Authorization header:

curl -H "Authorization: Bearer YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     -X POST http://DEVICE_IP:8088/control/load-model \
     -d '{"model_id": 1}'

Missing or invalid tokens return 401 Unauthorized:

{
  "error": "Unauthorized — provide Bearer token in Authorization header"
}

Unauthenticated Endpoints

These endpoints are accessible without a token (useful for health checks and monitoring):

EndpointDescription
GET /healthServer status, loaded models, uptime
GET /versionApp version, build type, device info
GET /metricsModel load state
GET /state/hardwareSoC, RAM, GPU, recommended config

Connecting to the API

Same Network (All Interfaces mode)

If the server is bound to 0.0.0.0 (debug builds), connect directly using the device's IP:

# Find device IP in Settings → About Phone → Status → IP Address
curl http://192.168.1.100:8088/health

ADB Port Forwarding (Localhost mode or Release builds)

Forward the device port to your computer:

# Single device
adb forward tcp:8088 tcp:8088
curl http://localhost:8088/health

# Multiple devices — use unique local ports
adb -s DEVICE1_SERIAL forward tcp:8088 tcp:8088
adb -s DEVICE2_SERIAL forward tcp:8089 tcp:8088
adb -s DEVICE3_SERIAL forward tcp:8090 tcp:8088

On-Device Access

From the device itself (e.g., Termux, or another app):

curl http://127.0.0.1:8088/health

Use a unique local port per active test script/process. Reusing the same local port (for example two scripts both using tcp:8088) will overwrite forwarding and cause intermittent control-plane disconnects.

1. Health & Status

GET /health No Auth

Check server status and which inference backends are available.

GET /metrics No Auth

Minimal endpoint returning only loaded model status.

GET /version No Auth

App version, build info, package name, device, and Android version.

GET /storage Auth Required

Storage usage information: available space, model directory size, database usage.

2. State Endpoints

GET /state Auth Required

Complete device and app state snapshot: model loading status, settings, memory, battery, thermal.

GET /state/settings Auth Required

User settings: theme, system prompt, remote API config, remote model name, API endpoints.

GET /state/inference Auth Required

Inference configuration state: loaded models, backend availability, context length, quantization settings.

GET /state/hardware No Auth

Device hardware profile: SoC, cores, GPU, RAM, and recommended optimal configuration.

GET /performance Auth Required

Real-time performance metrics: memory usage, battery, thermal status, last generation statistics.

3. Model Endpoints

GET /models Auth Required

List all downloaded models with metadata: name, path, size, quantization, parameter count, type.

GET /models/{id} Auth Required

Get detailed information about a specific model by database ID.

POST /control/load-model Auth Required

Load a model from disk. Unloads any currently loaded model. Auto-loads draft model for speculative decoding.

POST /control/unload-all Auth Required

Unload all currently loaded models (both MNN and GGUF).

POST /control/scan-models Auth Required

Scan models directory and register any new models found.

POST /control/download-model Auth Required

Download a model from a URL. Runs in background; monitor with /control/download-status.

GET /control/download-status Auth Required

Get progress of ongoing model download: bytes, percentage, speed, estimated time remaining.

POST /control/delete-model Auth Required

Delete a model file from disk and remove from database.

POST /control/rotate-auth-token Auth Required

Rotate authentication token. Returns new token in response.

POST /control/cancel-download Auth Required

Cancel an ongoing model download.

4. Configuration

GET /config Auth Required

Get current system configuration details: threads, context, quantization, KV cache settings, GPU layers.

POST /control/update-config Auth Required

Update sampling parameters: temperature, top-p, top-k, max tokens, repeat penalty.

POST /control/set-settings Auth Required

Update app settings: theme, system prompt, metrics server config, remote API endpoints, persona settings.

POST /control/set-inference-config Auth Required

Set advanced inference config: threading, context, KV cache, flash attention, GPU layers, draft model.

POST /control/switch-backend Auth Required

Switch inference backend: mnn, gguf, or remote.

POST /control/reload-model Auth Required

Reload the currently loaded model (apply config changes).

5. Inference & Generation

POST /test-prompt Auth Required

Generate text from a single prompt without creating a conversation. Returns tokens, speed, and output.

GET /control/generation-status Auth Required

Get current generation progress: tokens generated, speed, elapsed time, estimated remaining.

POST /control/stop-generation Auth Required

Stop an ongoing generation.

POST /control/auto-tune Auth Required

Run auto-tuning: sweep thread counts and configs, recommend optimal settings for current hardware + model.

POST /control/auto-tune/cancel Auth Required

Cancel an ongoing auto-tune operation.

POST /control/scenario/run Auth Required

Run a pre-defined or custom inference scenario. Returns scenario results and metrics.

GET /control/scenario/status Auth Required

Get status and progress of an ongoing scenario execution.

6. Conversations

GET /conversations Auth Required

List all conversations with titles, timestamps, backend types, and metadata.

GET /conversations/{id}/messages Auth Required

Get all messages in a conversation with role, content, timestamps, and token counts.

POST /conversations/{id}/send Auth Required

Send a message to a conversation and receive the AI response.

POST /control/create-conversation Auth Required

Create a new conversation with optional title, backend type, and character.

POST /control/reflect Auth Required

Trigger background memory extraction on current conversations for knowledge synthesis.

GET /control/reflect/tasks Auth Required

List active and completed reflection tasks with status, progress, and results.

DELETE /conversations/{id} Auth Required

Delete a conversation and all its messages.

7. Characters

GET /characters Auth Required

List all character cards with names, descriptions, personalities, and greetings.

GET /characters/{id}/export Auth Required

Export a character card in card format (JSON or PNG).

POST /characters/import Auth Required

Import a character card from an uploaded file (multipart).

DELETE /characters/{id} Auth Required

Delete a character card.

8. Benchmarks

POST /benchmark/run Auth Required

Run a single benchmark: load model, measure inference speed, save result. Configurable prompt, tokens, runs.

GET /benchmark/results Auth Required

Query benchmark results with filtering by SoC, model, backend. Supports pagination.

GET /benchmark/matrix Auth Required

Get results organized as a multi-dimensional matrix for cross-device comparison.

POST /benchmark/auto-matrix Auth Required

Run benchmarks across all model/backend combinations. Async — poll /benchmark/auto-matrix/status.

GET /benchmark/auto-matrix/status Auth Required

Get progress of ongoing auto-matrix benchmark generation.

GET /benchmark/export Auth Required

Export all benchmark results as JSON or CSV for analysis.

POST /benchmark/import Auth Required

Import benchmark results from another device (multipart JSON upload).

GET /benchmark/optimal-config Auth Required

Get recommended optimal configuration from benchmark data for current hardware/model.

DELETE /benchmark/clear Auth Required

Delete all benchmark results and profiles.

9. Debug

GET /debug/export Auth Required

Export full app state as ZIP: models, conversations, settings, logs, database.

GET /debug/errors Auth Required

Get crash logs and error history from the app.

GET /debug/log Auth Required

Get internal app logs. Query params: lines (max 2000), filter tag, since (timestamp).

GET /debug/logcat Auth Required

Get logcat output filtered by tag. Query params: lines (max 2000), filter tag.

GET /debug/live-telemetry Auth Required

Stream live performance telemetry: memory, thermal, power draw (Server-Sent Events).

DELETE /debug/live-telemetry Auth Required

Clear telemetry buffer and stop active streams.

GET /debug/live-telemetry/config Auth Required

Get current telemetry configuration and collection settings.

POST /debug/live-telemetry/config Auth Required

Configure telemetry collection: sampling interval, metrics to track, retention policy.

GET /debug/hybrid-diagnostics Auth Required

Get diagnostic data for hybrid inference: layer assignments, fallback events, performance attribution.

GET /debug/mnn-trace Auth Required

Get MNN backend execution trace: kernel timing, memory allocation, layer profiling.

DELETE /debug/mnn-trace Auth Required

Clear collected MNN trace buffer.

POST /debug/crash-test Auth Required

Intentionally trigger a crash for testing error reporting and recovery.

GET /debug/config Auth Required

Get full system configuration and debug runtime settings.

10. UI Control

POST /control/navigate Auth Required

Navigate to a specific screen: home, chat, models, settings, characters, benchmarks.

POST /control/toast Auth Required

Show a toast notification on the device screen.

11. QNN Hardware Acceleration

POST /control/qnn-probe Auth Required

Probe Qualcomm NPU capabilities synchronously. Experimental feature for hardware acceleration detection.

POST /control/qnn-probe-async Auth Required

Probe QNN capabilities asynchronously. Returns job ID; poll /control/qnn-probe-status for progress.

GET /control/qnn-probe-status Auth Required

Get status and results of async QNN hardware acceleration probe.

GET /control/qnn-hybrid-capabilities Auth Required

Get available Qualcomm NPU hybrid routing capabilities and performance profiles.

GET /control/qnn-m1-readiness Auth Required

Get M1/ARM readiness metrics and compatibility status for QNN backend.

12. MNN Memory Policy

GET /control/mnn-memory-policy Auth Required

Get current MNN backend memory allocation policy settings.

POST /control/mnn-memory-policy Auth Required

Configure MNN backend memory allocation policy and optimization strategy.

13. Memory Management (Facts, Edges, Documents)

GET /memory/facts Auth Required

List memory facts with optional filtering and search. Supports pagination and query parameters.

POST /memory/facts Auth Required

Create a new memory fact with content, tags, and optional metadata.

POST /memory/facts/{id}/pin Auth Required

Toggle pin status of a memory fact (important/frequently referenced).

POST /memory/facts/{id} Auth Required

Update an existing memory fact's content, tags, or metadata.

DELETE /memory/facts/{id} Auth Required

Archive a memory fact (soft delete; recoverable).

DELETE /memory/facts/{id}/hard Auth Required

Permanently delete a memory fact (hard delete; unrecoverable).

GET /memory/edges Auth Required

List relationship edges connecting memory facts. Supports filtering and search.

DELETE /memory/edges/{id} Auth Required

Delete a relationship edge between two memory facts.

GET /memory/documents Auth Required

List reference documents used in memory system for context and knowledge grounding.

DELETE /memory/documents/{id} Auth Required

Delete a reference document from the memory system.

GET /memory/search Auth Required

Full-text search across memory facts and documents. Supports query parameters.

GET /memory/stats Auth Required

Get memory system statistics: fact counts, edge counts, document counts, storage usage.

14. Auto-Tuning & Optimization

GET /control/auto-tune/status Auth Required

Get current auto-tune progress: phase, results, thermal status, ETA.

POST /control/auto-tune/cancel Auth Required

Cancel an ongoing auto-tune optimization process.

POST /control/forge-instant Auth Required

Run instant inference generation with optimized settings for rapid response.

Workflow Examples

7-Step AI Agent Workflow

A complete example of using the TokForge API to build an autonomous AI agent:

Step 1: Verify Device Readiness

curl -s http://localhost:8088/health | jq '.mnn_loaded'
# Output: true (device ready)

Step 2: List Available Models

curl -s -H "Authorization: Bearer TOKEN" \
     http://localhost:8088/models | jq '.data[] | .name'

Step 3: Load a Model

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"model_id": 3}' \
     http://localhost:8088/control/load-model

Step 4: Create a Conversation

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"title": "Agent Session", "backend": "mnn"}' \
     http://localhost:8088/control/create-conversation
# Returns: {"status":"ok", "id": "conv_12345"}

Step 5: Send Message and Get Response

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"message": "What is 2+2?"}' \
     http://localhost:8088/conversations/conv_12345/send
# Returns: {"status":"ok", "response": "2+2 equals 4."}

Step 6: Monitor Performance

curl -s http://localhost:8088/performance | jq '.last_generation_stats'
# Shows: tokens/sec, memory usage, thermal status

Step 7: Run Benchmark and Export

curl -X POST \
     -H "Authorization: Bearer TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Benchmark prompt", "num_tokens": 100}' \
     http://localhost:8088/benchmark/run

# Export results
curl -s http://localhost:8088/benchmark/export > results.json

Error Handling

CodeMeaning
200Success
400Bad Request (invalid parameters)
401Unauthorized (missing or invalid auth token)
404Not Found (resource doesn't exist)
408Timeout (operation took too long)
409Conflict (another operation already running)
500Internal Server Error

Rate Limits

The API applies per-token rate limits to prevent abuse:

Endpoint TypeLimitWindow
Read (GET, unauthenticated)100 requests1 minute
Read (GET, authenticated)200 requests1 minute
Write (POST, DELETE)50 requests1 minute
Benchmark operations5 concurrentContinuous
Model downloads1 concurrentContinuous

Hitting the limit returns 429 Too Many Requests. Retry-After header indicates seconds to wait.

Security

  • Localhost only by default: Server binds to 127.0.0.1 in release builds, not accessible over network without ADB
  • Auth tokens: Generated per-session, configured in Settings. Treat as secrets. Rotate regularly with /control/rotate-auth-token
  • No HTTPS: Uses plain HTTP over ADB tunnel. ADB provides the security boundary.
  • Debug mode only: API is only available in debug builds (BuildConfig.DEBUG) or when explicitly enabled.
  • Network binding: Release builds force localhost binding regardless of settings. Use ADB port forwarding for remote access.
  • CORS: Not applicable (native Android app, no browser context)
  • Input validation: All parameters validated server-side. Untrusted input sanitized before use.