A complete guide to the TokForge MetricsService API. Control inference, manage models, read conversations, and monitor performance—all from your computer or another application.
TokForge includes a built-in HTTP API server called MetricsService that exposes over 120 endpoints for remote control and monitoring. Whether you want to programmatically start conversations, load models, read inference metrics, or automate tasks on your phone from a desktop script, the TokForge API makes it possible.
The API is:
The MetricsService API runs on port 8088 by default and requires explicit enabling in the app settings.
Before you can connect from your computer, you'll need to enable the API server in TokForge settings.
mytoken123 for testing). This is required for all API requestsadb forward tcp:8088 tcp:8088 on your computerifconfig on the device
Enable Metrics Server in Settings → Advanced
Default port is 8088. You can customize this in settings if needed.
Once the API is enabled, you have two options to connect: direct LAN access or ADB tunneling.
If you bound the API to "All Interfaces" and your phone and computer are on the same network:
curl -H "Authorization: Bearer mytoken123" http://192.168.1.100:8088/health
Replace 192.168.1.100 with your phone's actual IP address.
If you bound the API to localhost only, use ADB to forward the port:
adb forward tcp:8088 tcp:8088
Then connect via localhost:
curl -H "Authorization: Bearer mytoken123" http://localhost:8088/health
Hit the health endpoint to verify your connection is working:
curl -H "Authorization: Bearer mytoken123" http://192.168.1.100:8088/health
Expected response:
{
"status": "ok",
"mnn_loaded": true,
"any_model_loaded": true,
"runtime_mb": 35,
"free_mb": 12,
"uptime_seconds": 3847,
"api_version": "1.0.0"
}
Here's a detailed tour of the most commonly used endpoints. All examples assume your token is mytoken123 and your phone is at 192.168.1.100:8088.
Request:
curl -H "Authorization: Bearer mytoken123" http://192.168.1.100:8088/health
Response:
{
"status": "ok",
"mnn_loaded": true,
"any_model_loaded": true,
"runtime_mb": 35,
"free_mb": 12,
"uptime_seconds": 3847,
"api_version": "1.0.0",
"device_model": "Pixel 7 Pro",
"android_version": "13"
}
Request:
curl -H "Authorization: Bearer mytoken123" http://192.168.1.100:8088/models/downloaded
Response:
{
"models": [
{
"id": "mistral-7b",
"name": "Mistral 7B",
"size_mb": 4096,
"parameter_count": 7000000000,
"type": "llm",
"quantization": "Q4_K_M",
"loaded": true
},
{
"id": "phi-2",
"name": "Phi-2",
"size_mb": 2700,
"parameter_count": 2700000000,
"type": "llm",
"quantization": "Q4_K_M",
"loaded": false
}
]
}
Request:
curl -X POST -H "Authorization: Bearer mytoken123" \
http://192.168.1.100:8088/models/mistral-7b/load
Response:
{
"status": "loading",
"model_id": "mistral-7b",
"model_name": "Mistral 7B",
"estimated_load_time_seconds": 5
}
Request:
curl -H "Authorization: Bearer mytoken123" http://192.168.1.100:8088/models/recommended
Response (truncated):
{
"models": [
{
"id": "mistral-7b",
"name": "Mistral 7B",
"description": "Fast, capable 7B model. Great for chat.",
"category": "chat",
"size_mb": 4096,
"required_ram_mb": 6000,
"quantization": "Q4_K_M",
"downloads": 15042
},
{
"id": "phi-2",
"name": "Phi-2",
"description": "Microsoft's 2.7B powerhouse. Fast inference.",
"category": "chat",
"size_mb": 2700,
"required_ram_mb": 4500,
"quantization": "Q4_K_M",
"downloads": 8231
}
]
}
Request:
curl -X POST -H "Authorization: Bearer mytoken123" \
-H "Content-Type: application/json" \
-d '{"character_id": 4}' \
http://192.168.1.100:8088/conversations/new
Response:
{
"conversation_id": "conv_78f3a2d9",
"character_id": 4,
"character_name": "Claude",
"created_at": "2026-04-07T14:23:45Z",
"status": "active"
}
The app navigates to chat when you create a conversation via API
Request:
curl -X POST -H "Authorization: Bearer mytoken123" \
-H "Content-Type: application/json" \
-d '{"message": "What is machine learning?"}' \
http://192.168.1.100:8088/conversations/conv_78f3a2d9/send
Response:
{
"message_id": "msg_5f8c9a21",
"conversation_id": "conv_78f3a2d9",
"user_message": "What is machine learning?",
"ai_response": "Machine learning is a subset of artificial intelligence where systems learn and improve from experience without being explicitly programmed...",
"telemetry": {
"total_tokens": 187,
"prompt_tokens": 34,
"completion_tokens": 153,
"tokens_per_second": 18.5,
"time_to_first_token_ms": 142,
"prefill_time_ms": 156,
"decode_time_ms": 8243,
"backend": "MNN",
"backend_version": "0.4.1"
},
"memory": {
"used_mb": 4200,
"free_mb": 8,
"runtime_mb": 42
},
"timestamp": "2026-04-07T14:24:12Z"
}
Response with code rendering and 13.2 tok/s telemetry badge
Request:
curl -H "Authorization: Bearer mytoken123" http://192.168.1.100:8088/characters
Response (truncated):
{
"characters": [
{
"id": 1,
"name": "Assistant",
"personality": "Helpful, harmless, and honest AI assistant.",
"tags": ["helpful", "neutral", "general"],
"description": "A general-purpose AI assistant.",
"avatar_url": "https://..."
},
{
"id": 4,
"name": "Claude",
"personality": "Thoughtful, analytical, and creative.",
"tags": ["analytical", "thoughtful", "creative"],
"description": "An AI assistant focused on reasoning and clarity.",
"avatar_url": "https://..."
}
]
}
Request:
curl -H "Authorization: Bearer mytoken123" \
"http://192.168.1.100:8088/memory/facts?character_id=4"
Response:
{
"character_id": 4,
"facts": [
{"fact": "User's name is Alex", "confidence": 0.95},
{"fact": "User prefers dark mode", "confidence": 0.88}
]
}
Request:
curl -H "Authorization: Bearer mytoken123" \
"http://192.168.1.100:8088/memory/stats?character_id=4"
Response:
{
"character_id": 4,
"total_conversations": 47,
"total_messages": 1203,
"memory_facts": 12,
"last_interaction": "2026-04-07T13:45:22Z"
}
Request:
curl -X POST -H "Authorization: Bearer mytoken123" \
http://192.168.1.100:8088/control/open-model-browser
Response:
{"status": "success", "screen": "model_browser"}
Request:
curl -X POST -H "Authorization: Bearer mytoken123" \
http://192.168.1.100:8088/control/open-forge-lab
Response:
{"status": "success", "screen": "forge_lab"}
Request:
curl -H "Authorization: Bearer mytoken123" \
http://192.168.1.100:8088/control/generation-status
Response:
{
"is_generating": true,
"conversation_id": "conv_78f3a2d9",
"tokens_generated": 42,
"elapsed_seconds": 2.3,
"tokens_per_second": 18.26,
"model_id": "mistral-7b"
}
Request:
curl -H "Authorization: Bearer mytoken123" \
http://192.168.1.100:8088/settings
Response (truncated):
{
"theme": "dark",
"notification_enabled": true,
"auto_backup": true,
"inference_timeout_seconds": 300,
"max_context_length": 4096
}
Request:
curl -X POST -H "Authorization: Bearer mytoken123" \
-H "Content-Type: application/json" \
-d '{"value": "light"}' \
http://192.168.1.100:8088/settings/theme
Response:
{"key": "theme", "value": "light", "status": "updated"}
Request:
curl -H "Authorization: Bearer mytoken123" \
http://192.168.1.100:8088/debug/logcat
Response (truncated):
{
"logs": [
"[14:23:45] Model loaded successfully: mistral-7b",
"[14:24:12] Inference started for conversation conv_78f3a2d9",
"[14:24:14] Generated 153 tokens at 18.5 tok/s"
],
"timestamp": "2026-04-07T14:25:00Z"
}
Let's create a practical Python script that connects to TokForge, lists models, starts a conversation, and sends messages.
Requirements:
requests library: pip install requestsComplete working example:
import requests
import json
import time
# Configuration
API_BASE = "http://192.168.1.100:8088"
TOKEN = "mytoken123"
def get_headers():
"""Return headers with bearer token"""
return {
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json"
}
def check_health():
"""Check API health"""
response = requests.get(f"{API_BASE}/health", headers=get_headers())
return response.json()
def list_models():
"""List downloaded models"""
response = requests.get(f"{API_BASE}/models/downloaded", headers=get_headers())
return response.json()["models"]
def start_conversation(character_id=4):
"""Start a new conversation"""
data = {"character_id": character_id}
response = requests.post(
f"{API_BASE}/conversations/new",
headers=get_headers(),
json=data
)
return response.json()
def send_message(conversation_id, message):
"""Send a message in a conversation"""
data = {"message": message}
response = requests.post(
f"{API_BASE}/conversations/{conversation_id}/send",
headers=get_headers(),
json=data
)
return response.json()
# Main execution
if __name__ == "__main__":
print("Checking API health...")
health = check_health()
print(f"Status: {health.get('status')}")
print(f"Model loaded: {health.get('any_model_loaded')}")
print()
print("Listing downloaded models...")
models = list_models()
for model in models:
print(f" - {model['name']} ({model['size_mb']}MB)")
print()
print("Starting conversation with Claude...")
conv = start_conversation(character_id=4)
conv_id = conv["conversation_id"]
print(f"Conversation started: {conv_id}")
print()
print("Sending message...")
result = send_message(conv_id, "What is Python?")
print(f"User: {result['user_message']}")
print(f"AI: {result['ai_response'][:100]}...")
print(f"Speed: {result['telemetry']['tokens_per_second']:.1f} tok/s")
Save this as tokforge_client.py and run:
python tokforge_client.py
Expected output:
Checking API health...
Status: ok
Model loaded: True
Listing downloaded models...
- Mistral 7B (4096MB)
- Phi-2 (2700MB)
Starting conversation with Claude...
Conversation started: conv_78f3a2d9
Sending message...
User: What is Python?
AI: Python is a high-level, interpreted programming language known for its simplicity...
Speed: 18.5 tok/s
The TokForge API is designed for local network use. Follow these best practices:
Always use a strong, randomly generated token. Avoid simple tokens like 123456 or password. Example strong token:
akfj9s8df-jklsd9-fjsdklf-sdfj9sdkfj
The API runs over HTTP without encryption. This is safe for local networks only. Never expose the API to the public internet.
If using the API in scripts, store the token in environment variables, not hardcoded:
import os
TOKEN = os.getenv("TOKFORGE_TOKEN")
.gitignore for files containing secrets.
You now have all the tools to build powerful automations and integrations with TokForge. Here's what to explore next:
Have a cool use case? Share your API integration with the TokForge community and help others build amazing things.