Guide 9: Advanced Development

Building an AI Agent With TokForge's 120+ API Endpoints

Learn to orchestrate TokForge from Python. Build agents that manage models, run conversations, monitor performance, and automate AI workflows on your phone.

What You'll Build

By the end of this guide, you'll have a working Python AI agent that:

Connects to TokForge running on your phone via HTTP API
Manages models: list, load, and switch between them
Creates and orchestrates multi-turn conversations with characters
Sends prompts and reads responses with full telemetry (tokens/second, TTFT, token count)
Monitors memory facts accumulated during conversations
Polls generation status and handles concurrent requests
Automates complex multi-character dialogues and workflows

Essentially, you'll treat TokForge on your phone as a remote AI backend, controlled entirely from your laptop via Python.

TokForge responses include detailed telemetry, perfect for agent automation

Prerequisites

Python 3.7+ with pip installed
requests library: pip install requests
TokForge app installed on your phone with API enabled (see the API Walkthrough Guide)
Network access: Phone and laptop on the same WiFi network, or ADB forwarding configured
API token (optional but recommended for production)

Enable the API in Settings → Advanced → Metrics Server

The TokForge API Architecture

TokForge exposes 120+ endpoints organized into 11 logical handler groups:

Handler Group	Purpose	Key Endpoints
Health	Service status	`/health`
ModelBrowser	List available/downloaded models	`/models/downloaded`, `/models/available`
Model	Load, unload, query models	`/models/{id}/load`, `/models/{id}/info`
Inference	Run text generation, embeddings	`/inference/generate`, `/inference/embed`
Conversation	Multi-turn dialogue management	`/conversations/new`, `/conversations/{id}/send`
Memory	Character memory & facts	`/memory/facts`, `/memory/store`
Character	Character profiles & settings	`/characters`, `/characters/{id}`
Benchmark	Performance metrics & latency	`/benchmark/stats`
Download	Manage model downloads	`/download/progress`, `/download/cancel`
Diagnostic	Logs, debugging, status	`/diagnostic/logs`
Settings	App configuration	`/settings`, `/settings/update`

Authentication: All endpoints accept an optional Bearer token via the Authorization header:

Authorization: Bearer YOUR_API_TOKEN

Request/Response Format: All payloads are JSON. Responses include metadata like telemetry (tokens/sec, latency) for inference endpoints.

Step 1: Build Your API Client Class

Create a reusable TokForgeClient class to abstract HTTP calls. This makes your agent code clean and maintainable.

import requests
import json
from typing import Optional, Dict, Any

class TokForgeClient:
    """HTTP client for TokForge API with automatic Bearer token handling."""

    def __init__(self, host: str, port: int = 8088, token: str = ""):
        """
        Args:
            host: IP or hostname of phone (e.g., "192.168.1.100")
            port: TokForge API port (default 8088)
            token: Optional Bearer token for authentication
        """
        self.base_url = f"http://{host}:{port}"
        self.token = token
        self.headers = {"Content-Type": "application/json"}
        if token:
            self.headers["Authorization"] = f"Bearer {token}"

    def _request(self, method: str, path: str, data: Optional[Dict] = None) -> Dict:
        """Make HTTP request with error handling."""
        url = f"{self.base_url}{path}"
        try:
            if method == "GET":
                resp = requests.get(url, headers=self.headers, timeout=30)
            elif method == "POST":
                resp = requests.post(url, headers=self.headers, json=data or {}, timeout=30)
            else:
                raise ValueError(f"Unsupported method: {method}")

            resp.raise_for_status()
            return resp.json()
        except requests.exceptions.RequestException as e:
            print(f"API Error [{method} {path}]: {e}")
            return {"error": str(e)}

    def get(self, path: str) -> Dict:
        """GET request."""
        return self._request("GET", path)

    def post(self, path: str, data: Optional[Dict] = None) -> Dict:
        """POST request."""
        return self._request("POST", path, data)

    # High-level convenience methods

    def health(self) -> Dict:
        """Check service health."""
        return self.get("/health")

    def models_downloaded(self) -> Dict:
        """List all downloaded models."""
        return self.get("/models/downloaded")

    def models_available(self) -> Dict:
        """List available models for download."""
        return self.get("/models/available")

    def load_model(self, model_id: str) -> Dict:
        """Load a model into memory."""
        return self.post(f"/models/{model_id}/load", {})

    def unload_model(self, model_id: str) -> Dict:
        """Unload a model."""
        return self.post(f"/models/{model_id}/unload", {})

    def characters(self) -> Dict:
        """Get all character profiles."""
        return self.get("/characters")

    def new_conversation(self, character_id: str) -> Dict:
        """Start a new conversation with a character."""
        return self.post("/conversations/new", {"character_id": character_id})

    def send_message(self, conv_id: str, message: str) -> Dict:
        """Send a message in a conversation. Returns response with telemetry."""
        return self.post(f"/conversations/{conv_id}/send", {"message": message})

    def memory_facts(self, character_id: str) -> Dict:
        """Get accumulated memory facts for a character."""
        return self.get(f"/memory/facts?character_id={character_id}")

    def generation_status(self) -> Dict:
        """Check if generation is in progress."""
        return self.get("/control/generation-status")

    def settings(self) -> Dict:
        """Get app settings."""
        return self.get("/settings")

    def benchmark_stats(self) -> Dict:
        """Get performance benchmark stats."""
        return self.get("/benchmark/stats")

Step 2: Build Your Agent Loop

Now create the core agent logic that orchestrates conversations and handles telemetry.

class TokForgeAgent:
    """AI Agent that orchestrates TokForge for multi-turn dialogue."""

    def __init__(self, client: TokForgeClient, model_id: str, character_id: str):
        self.client = client
        self.model_id = model_id
        self.character_id = character_id
        self.conversation_id = None
        self.turn_count = 0
        self.total_tokens = 0
        self.total_time = 0.0

    def setup(self) -> bool:
        """Initialize agent: check health, load model, start conversation."""
        # 1. Check health
        health = self.client.health()
        if "error" in health:
            print(f"❌ Health check failed: {health['error']}")
            return False
        print(f"✓ Service is healthy")

        # 2. Load model
        print(f"Loading model {self.model_id}...")
        load_result = self.client.load_model(self.model_id)
        if "error" in load_result:
            print(f"❌ Failed to load model: {load_result['error']}")
            return False
        print(f"✓ Model loaded")

        # 3. Start conversation
        print(f"Starting conversation with character {self.character_id}...")
        conv = self.client.new_conversation(self.character_id)
        if "error" in conv or "id" not in conv:
            print(f"❌ Failed to create conversation: {conv}")
            return False
        self.conversation_id = conv["id"]
        print(f"✓ Conversation started: {self.conversation_id}")

        return True

    def send_prompt(self, prompt: str) -> Optional[str]:
        """Send a prompt and return the response."""
        if not self.conversation_id:
            print("❌ No active conversation")
            return None

        self.turn_count += 1
        print(f"\n[Turn {self.turn_count}] You: {prompt}")

        response_data = self.client.send_message(self.conversation_id, prompt)

        if "error" in response_data:
            print(f"❌ Error: {response_data['error']}")
            return None

        # Extract response and telemetry
        response_text = response_data.get("response", "")
        telemetry = response_data.get("telemetry", {})

        tokens_per_sec = telemetry.get("tokens_per_second", 0)
        ttft = telemetry.get("ttft_ms", 0)
        token_count = telemetry.get("tokens_generated", 0)

        print(f"Agent: {response_text}")
        print(f"  📊 {token_count} tokens @ {tokens_per_sec:.1f} tok/s (TTFT: {ttft}ms)")

        self.total_tokens += token_count
        self.total_time += ttft / 1000.0

        return response_text

    def get_memory(self) -> Dict:
        """Retrieve accumulated memory facts."""
        memory = self.client.memory_facts(self.character_id)
        return memory.get("facts", [])

    def print_stats(self):
        """Print conversation statistics."""
        avg_time = self.total_time / max(self.turn_count, 1)
        print(f"\n" + "="*50)
        print(f"Conversation Summary")
        print(f"="*50)
        print(f"Turns: {self.turn_count}")
        print(f"Total tokens: {self.total_tokens}")
        print(f"Avg TTFT: {avg_time*1000:.1f}ms")
        memory = self.get_memory()
        print(f"Memory facts: {len(memory)}")
        print("="*50)

Step 3: Advanced Patterns

Multi-Character Orchestration

Switch between characters mid-session or run parallel conversations:

class MultiCharacterAgent:
    """Manage multiple character conversations simultaneously."""

    def __init__(self, client: TokForgeClient, model_id: str):
        self.client = client
        self.model_id = model_id
        self.conversations = {}  # {character_id: conv_id}

    def start_with(self, character_id: str) -> str:
        """Start conversation with a character."""
        if character_id in self.conversations:
            return self.conversations[character_id]

        conv = self.client.new_conversation(character_id)
        conv_id = conv.get("id")
        if conv_id:
            self.conversations[character_id] = conv_id
        return conv_id

    def send_to(self, character_id: str, message: str) -> Dict:
        """Send message to specific character."""
        conv_id = self.conversations.get(character_id)
        if not conv_id:
            conv_id = self.start_with(character_id)
        return self.client.send_message(conv_id, message)

Performance Monitoring Dashboard

Track metrics over time to optimize agent behavior:

import time

class PerformanceMonitor:
    """Track and analyze agent performance metrics."""

    def __init__(self):
        self.metrics = []

    def record(self, tokens: int, ttft_ms: float, tokens_per_sec: float):
        """Record a generation event."""
        self.metrics.append({
            "timestamp": time.time(),
            "tokens": tokens,
            "ttft_ms": ttft_ms,
            "tokens_per_sec": tokens_per_sec
        })

    def report(self):
        """Print performance summary."""
        if not self.metrics:
            print("No metrics recorded")
            return

        tokens = [m["tokens"] for m in self.metrics]
        ttfts = [m["ttft_ms"] for m in self.metrics]
        rates = [m["tokens_per_sec"] for m in self.metrics]

        print(f"\nPerformance Report ({len(self.metrics)} samples)")
        print(f"  Avg tokens/generation: {sum(tokens)/len(tokens):.1f}")
        print(f"  Avg TTFT: {sum(ttfts)/len(ttfts):.1f}ms")
        print(f"  Avg throughput: {sum(rates)/len(rates):.1f} tok/s")
        print(f"  Peak throughput: {max(rates):.1f} tok/s")

Auto Model Switching

Switch models based on task complexity:

class AdaptiveAgent:
    """Switch models based on task requirements."""

    def __init__(self, client: TokForgeClient, fast_model: str, powerful_model: str):
        self.client = client
        self.fast = fast_model
        self.powerful = powerful_model
        self.current = None

    def choose_model(self, prompt: str) -> str:
        """Select model based on prompt length/complexity."""
        # Simple heuristic: complex prompts → powerful model
        if len(prompt) > 200 or any(x in prompt.lower() for x in ["analyze", "complex", "detailed"]):
            return self.powerful
        return self.fast

    def send(self, prompt: str) -> Dict:
        """Send with automatic model selection."""
        model = self.choose_model(prompt)
        if model != self.current:
            print(f"Switching to {model}")
            self.client.load_model(model)
            self.current = model
        # ... send prompt ...

Step 4: Full Working Example

Put it all together in a complete, runnable script:

#!/usr/bin/env python3
"""
Complete example: AI Agent controlling TokForge on your phone.
Run: python agent.py --host 192.168.1.100 --token YOUR_TOKEN
"""

import argparse
import sys

def main():
    parser = argparse.ArgumentParser(description="TokForge AI Agent")
    parser.add_argument("--host", required=True, help="Phone IP (e.g., 192.168.1.100)")
    parser.add_argument("--port", type=int, default=8088, help="API port")
    parser.add_argument("--token", default="", help="Bearer token")
    parser.add_argument("--model", default="mistral-7b", help="Model ID to load")
    parser.add_argument("--character", default="assistant", help="Character ID")

    args = parser.parse_args()

    # Initialize client and agent
    client = TokForgeClient(args.host, args.port, args.token)
    agent = TokForgeAgent(client, args.model, args.character)

    # Setup
    if not agent.setup():
        print("Setup failed")
        sys.exit(1)

    # Interactive loop
    monitor = PerformanceMonitor()

    print("\nAgent ready. Type 'quit' to exit, 'stats' for summary.\n")

    while True:
        try:
            user_input = input("You: ").strip()

            if user_input.lower() == "quit":
                break
            elif user_input.lower() == "stats":
                agent.print_stats()
                monitor.report()
                continue
            elif not user_input:
                continue

            # Send prompt and collect telemetry
            response = client.send_message(agent.conversation_id, user_input)
            if "telemetry" in response:
                tel = response["telemetry"]
                monitor.record(
                    tel.get("tokens_generated", 0),
                    tel.get("ttft_ms", 0),
                    tel.get("tokens_per_second", 0)
                )

        except KeyboardInterrupt:
            print("\nExiting...")
            break
        except Exception as e:
            print(f"Error: {e}")

    # Final summary
    agent.print_stats()
    monitor.report()

if __name__ == "__main__":
    main()

Real API Responses

Health Endpoint

GET /health

Check service status and version.

{
  "status": "ok",
  "version": "2.1.0",
  "uptime_seconds": 3847,
  "models_loaded": 1,
  "api_endpoints": 127
}

Models Downloaded

GET /models/downloaded

List all downloaded models with metadata.

{
  "models": [
    {
      "id": "mistral-7b",
      "name": "Mistral 7B",
      "size_gb": 4.2,
      "quantization": "q4_0",
      "downloaded": true,
      "last_loaded": "2025-01-15T10:32:00Z"
    },
    {
      "id": "llama2-13b",
      "name": "Llama 2 13B",
      "size_gb": 7.8,
      "quantization": "q4_0",
      "downloaded": true,
      "last_loaded": "2025-01-14T18:20:00Z"
    }
  ]
}

Send Message (Conversation)

POST /conversations/{id}/send

Send a message and get response with full telemetry.

Request:
{
  "message": "What is the capital of France?"
}

Response:
{
  "id": "conv_abc123",
  "response": "The capital of France is Paris. It's located in the north-central part of the country and is known for its iconic landmarks like the Eiffel Tower and Notre-Dame Cathedral.",
  "telemetry": {
    "tokens_generated": 47,
    "tokens_per_second": 18.3,
    "ttft_ms": 245,
    "latency_ms": 2567,
    "model_id": "mistral-7b",
    "timestamp": "2025-01-15T10:35:22Z"
  },
  "memory": {
    "facts_added": 1,
    "total_facts": 12
  }
}

Memory Facts

GET /memory/facts?character_id={id}

Retrieve accumulated memory for a character.

{
  "character_id": "assistant",
  "total_facts": 12,
  "facts": [
    {
      "id": "fact_001",
      "content": "User is interested in AI and machine learning",
      "confidence": 0.95,
      "created_at": "2025-01-15T09:20:00Z"
    },
    {
      "id": "fact_002",
      "content": "User prefers detailed technical explanations",
      "confidence": 0.87,
      "created_at": "2025-01-15T09:25:00Z"
    }
  ]
}

Deployment & Best Practices

Network Security: Use VPN or ADB forwarding if running over untrusted networks. Always enable API authentication tokens in production.
Error Handling: The agent examples above include basic error checking. Wrap API calls in try-except and implement exponential backoff for retries.
Token Management: Monitor total tokens generated; implement rate limiting if needed to avoid overwhelming your phone's resources.
Model Lifecycle: Unload models when switching to free memory on the phone. Use /models/{id}/unload for cleanup.
Concurrency: TokForge queues generation requests. Poll /control/generation-status to know when requests complete.
Monitoring: Collect telemetry (tokens/sec, TTFT, memory usage) from every response. Use this to optimize model selection and prompt design.

Next Steps

You now have the foundation for building sophisticated AI agents with TokForge. Here are natural extensions:

API Reference: Explore all 120+ endpoints in the API documentation. Each endpoint is fully documented with request/response schemas.
API Walkthrough: Follow the API Walkthrough Guide for hands-on examples with curl and Postman.
Advanced Automation: Build specialized agents for specific tasks (summarization, Q&A, content generation, code review) using multi-character orchestration and model switching.
Integration: Connect your agent to other services (Slack bots, webhooks, scheduled tasks) to create end-to-end automation pipelines.

Pro Tip: Start with a simple single-conversation agent (like the Step 4 example), test it end-to-end, then layer in multi-character support, monitoring, and adaptive model selection. The API is stable and forgiving, so don't be afraid to experiment!