LLM System API Reference

The LLM (Large Language Model) system in SpoonOS provides a unified, provider-agnostic interface for working with multiple AI services including OpenAI, Anthropic, Google, and DeepSeek.

Overview

SpoonOS's LLM system offers:

Provider Agnosticism: Unified API across all providers
Automatic Fallback: Intelligent provider switching on failures
Load Balancing: Distribute requests across multiple providers
Comprehensive Monitoring: Usage tracking and performance metrics
Flexible Configuration: Multiple configuration sources and validation
Advanced Features: Streaming, function calling, and tool integration

Core Components

LLMManager

Central orchestrator for LLM operations with provider management, fallback, and load balancing.

Key Features:

Unified chat and generation API
Automatic provider fallback
Load balancing and health monitoring
Streaming and batch operations
Comprehensive error handling

from spoon_ai.llm import LLMManager

llm_manager = LLMManager()
response = await llm_manager.chat(messages)

Provider Interface

Abstract interface that all LLM providers implement for consistent behavior.

Key Features:

Standardized provider contract
Comprehensive capability system
Unified response format
Built-in error handling patterns

from spoon_ai.llm import LLMProviderInterface

class CustomProvider(LLMProviderInterface):
    async def chat(self, messages, **kwargs) -> LLMResponse:
        # Implementation

Configuration Manager

Handles configuration loading, validation, and management from multiple sources.

Key Features:

Multi-source configuration (files, env vars, runtime)
Provider-specific validation
Secure credential management
Configuration templates and merging

from spoon_ai.llm import ConfigurationManager

config_manager = ConfigurationManager("config.json")
config_manager.set_provider_config("openai", {...})

Quick Start

Basic Usage

from spoon_ai.llm import LLMManager
from spoon_ai.schema import Message

# Initialize manager
llm_manager = LLMManager()

# Simple chat
messages = [Message(role="user", content="Hello!")]
response = await llm_manager.chat(messages)
print(response.content)

With Configuration

from spoon_ai.llm import ConfigurationManager, LLMManager

# Load configuration
config_manager = ConfigurationManager("config.json")
llm_manager = LLMManager(config_manager=config_manager)

# Chat with specific provider
response = await llm_manager.chat(messages, provider="openai")

Streaming Responses

# Stream responses for real-time output
async for chunk in llm_manager.chat_stream(messages):
    print(chunk, end="", flush=True)

Supported Providers

OpenAI

Models: GPT-4.1, GPT-4o, GPT-4o-mini, o1-preview, o1-mini
Features: Function calling, streaming, embeddings
Best for: General-purpose tasks, reasoning, code generation

Anthropic (Claude)

Models: Claude-Sonnet-4-20250514, Claude-3.5 Sonnet, Claude-3.5 Haiku
Features: Large context windows, prompt caching, safety
Best for: Long documents, analysis, safety-critical applications

Google (Gemini)

Models: Gemini-2.5-Pro, Gemini-2.0-Flash, Gemini-1.5-Pro
Features: Multimodal, fast inference, large context
Best for: Multimodal tasks, cost-effectiveness, long context

DeepSeek

Models: DeepSeek-Reasoner, DeepSeek-V3, DeepSeek-Chat
Features: Advanced reasoning, code-specialized, cost-effective
Best for: Complex reasoning, code generation, technical tasks

OpenRouter

Models: Access to multiple providers through single API
Features: Model routing, cost optimization
Best for: Experimentation, cost optimization

Advanced Patterns

Provider Fallback

# Configure automatic fallback
llm_manager = LLMManager()
llm_manager.set_primary_provider("openai")
llm_manager.add_fallback_provider("anthropic")
llm_manager.add_fallback_provider("deepseek")

# Automatic fallback on failures
response = await llm_manager.chat(messages)
print(f"Used provider: {response.provider}")

Load Balancing

# Weighted load balancing
llm_manager.set_load_balancing_strategy("weighted")
llm_manager.set_provider_weight("openai", 0.6)
llm_manager.set_provider_weight("anthropic", 0.4)

# Requests distributed by weights
response = await llm_manager.chat(messages)

Tool Integration

# Function calling with tools
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}}
        }
    }
]

response = await llm_manager.chat_with_tools(messages, tools)
for tool_call in response.tool_calls:
    # Execute tool calls
    result = await execute_tool(tool_call)

Batch Operations

# Process multiple requests efficiently
batch_messages = [
    [Message(role="user", content="Summarize text A")],
    [Message(role="user", content="Translate text B")],
    [Message(role="user", content="Analyze text C")]
]

responses = await llm_manager.batch_chat(batch_messages)
for i, response in enumerate(responses):
    print(f"Response {i+1}: {response.content[:50]}...")

Configuration

Environment Variables

# API Keys (required)
OPENAI_API_KEY=sk-your_key
ANTHROPIC_API_KEY=sk-ant-your_key
GOOGLE_API_KEY=your_key
DEEPSEEK_API_KEY=your_key

# Global Settings (optional)
DEFAULT_LLM_PROVIDER=openai
DEFAULT_MODEL=gpt-4.1
DEFAULT_TEMPERATURE=0.3
LLM_TIMEOUT=30
LLM_RETRY_ATTEMPTS=3

JSON Configuration

{
  "llm": {
    "default_provider": "openai",
    "timeout": 30,
    "providers": {
      "openai": {
        "api_key": "sk-...",
        "model": "gpt-4.1",
        "temperature": 0.7
      },
      "anthropic": {
        "api_key": "sk-ant-...",
        "model": "claude-sonnet-4-20250514"
      }
    }
  }
}

Runtime Configuration

from spoon_ai.llm import ConfigurationManager

config_manager = ConfigurationManager()

# Configure providers
config_manager.set_provider_config("openai", {
    "api_key": "sk-...",
    "model": "gpt-4.1",
    "temperature": 0.7
})

# Set global settings
config_manager.set_global_config({
    "default_provider": "openai",
    "timeout": 30
})

Response Format

All LLM operations return a standardized LLMResponse:

@dataclass
class LLMResponse:
    content: str                    # Generated text
    provider: str                   # Provider used
    model: str                      # Model used
    finish_reason: str              # Why generation stopped
    native_finish_reason: str       # Provider-specific reason
    tool_calls: List[ToolCall]      # Function calls (if any)
    usage: Dict[str, int]          # Token usage statistics
    metadata: Dict[str, Any]       # Additional metadata
    request_id: str                # Unique request ID
    duration: float                # Request duration
    timestamp: datetime            # Request time

Error Handling

Structured Error Types

from spoon_ai.llm.errors import (
    LLMError,              # Base LLM error
    ProviderError,         # Provider-specific errors
    ConfigurationError,    # Configuration issues
    RateLimitError,        # Rate limiting
    AuthenticationError,   # Auth failures
    ModelNotFoundError,    # Invalid model
    TokenLimitError,       # Token limit exceeded
    NetworkError,          # Network issues
    ProviderUnavailableError  # Provider down
)

try:
    response = await llm_manager.chat(messages)
except RateLimitError:
    # Handle rate limiting
    await asyncio.sleep(60)
    response = await llm_manager.chat(messages)
except AuthenticationError:
    # Handle auth issues
    print("API key invalid")
except ProviderError as e:
    print(f"Provider {e.provider} failed: {e.message}")

Automatic Recovery

# Framework handles most errors automatically
llm_manager = LLMManager()

# Automatic retry with backoff
# Automatic fallback to other providers
# Automatic rate limit handling

response = await llm_manager.chat(messages)  # Robust by default

Monitoring and Metrics

Usage Tracking

# Get comprehensive metrics
metrics = llm_manager.get_metrics()
print(f"Total requests: {metrics['total_requests']}")
print(f"Success rate: {metrics['success_rate']}%")
print(f"Total tokens: {metrics['total_tokens']}")
print(f"Total cost: ${metrics['total_cost']}")

Provider Statistics

# Per-provider metrics
stats = llm_manager.get_provider_stats()
for provider, data in stats.items():
    print(f"{provider}: {data['requests']} requests, {data['errors']} errors")

Health Monitoring

# Check provider health
health = await llm_manager.health_check()
print(f"Healthy providers: {health['healthy_providers']}")

# Individual provider health
health = await llm_manager.health_check("openai")
print(f"OpenAI healthy: {health['healthy']}")

Custom Provider Implementation

Basic Custom Provider

from spoon_ai.llm import LLMProviderInterface, LLMResponse

class MyCustomProvider(LLMProviderInterface):
    async def initialize(self, config: Dict[str, Any]) -> None:
        self.api_key = config["api_key"]

    async def chat(self, messages, **kwargs) -> LLMResponse:
        # Your implementation
        response = await self._call_api(messages, **kwargs)
        return LLMResponse(
            content=response["content"],
            provider="custom",
            model="my-model",
            finish_reason="stop",
            native_finish_reason="stop",
            tool_calls=[],
            usage=response.get("usage"),
            metadata={},
            request_id="custom-123",
            duration=response.get("duration", 0.0)
        )

    def get_metadata(self) -> ProviderMetadata:
        return ProviderMetadata(
            name="custom",
            version="1.0",
            capabilities=[ProviderCapability.CHAT],
            max_tokens=4096,
            supports_system_messages=True
        )

    async def health_check(self) -> bool:
        try:
            # Test API connectivity
            return True
        except:
            return False

    async def cleanup(self) -> None:
        pass

    # Implement other required methods...

Registering Custom Providers

from spoon_ai.llm import register_provider

# Register your custom provider
register_provider("my_provider", MyCustomProvider)

# Now you can use it
llm_manager.set_primary_provider("my_provider")

Best Practices

Configuration Management

Store API keys securely in environment variables
Use configuration files for complex setups
Validate configurations before deployment
Use different configs for dev/staging/production

Error Handling

Let the framework handle common errors automatically
Use specific error types for custom logic
Implement proper fallback chains
Monitor error rates and patterns

Performance Optimization

Use streaming for real-time applications
Batch requests when possible
Monitor token usage and costs
Cache responses when appropriate

Provider Selection

Test multiple providers for your use case
Consider cost vs. quality trade-offs
Use fallbacks for production reliability
Monitor provider performance regularly

Migration Guide

From Direct Provider APIs

# Before: Direct OpenAI API
import openai
client = openai.OpenAI(api_key="sk-...")
response = client.chat.completions.create(...)

# After: SpoonOS LLM Manager
from spoon_ai.llm import LLMManager
llm_manager = LLMManager()
response = await llm_manager.chat(messages)  # Automatic provider selection

From Other LLM Libraries

# Before: LangChain
from langchain.llms import OpenAI
llm = OpenAI(model="gpt-4", temperature=0.7)

# After: SpoonOS
from spoon_ai.llm import LLMManager
llm_manager = LLMManager()
llm_manager.configure_provider("openai", {
    "model": "gpt-4.1",
    "temperature": 0.7
})

Troubleshooting

Common Issues

Provider Connection Failed

# Check API keys
health = await llm_manager.health_check("openai")
if not health["healthy"]:
    print(f"Error: {health.get('error')}")

# Verify configuration
config = llm_manager.get_provider_config("openai")
print(f"API Key configured: {bool(config.api_key)}")

Rate Limiting

# Increase timeout and retry settings
llm_manager.set_retry_policy(max_attempts=5, backoff_factor=2.0)
llm_manager.set_timeout(60)

# Use multiple providers to distribute load
llm_manager.add_fallback_provider("anthropic")

High Latency

# Enable monitoring to identify bottlenecks
llm_manager.enable_monitoring(["execution_time", "success_rate"])

# Check metrics
metrics = llm_manager.get_metrics()
print(f"Average latency: {metrics['avg_latency']}s")

# Consider faster providers or models
llm_manager.set_primary_provider("gemini")  # Generally faster

Configuration Errors

from spoon_ai.llm import ConfigurationManager

config_manager = ConfigurationManager()
errors = config_manager.validate_config(your_config)
for error in errors:
    print(f"Config error: {error}")

Overview​

Core Components​

LLMManager​

Provider Interface​

Configuration Manager​

Quick Start​

Basic Usage​

With Configuration​

Streaming Responses​

Supported Providers​

OpenAI​

Anthropic (Claude)​

Google (Gemini)​

DeepSeek​

OpenRouter​

Advanced Patterns​

Provider Fallback​

Load Balancing​

Tool Integration​

Batch Operations​

Configuration​

Environment Variables​

JSON Configuration​

Runtime Configuration​

Response Format​

Error Handling​

Structured Error Types​

Automatic Recovery​

Monitoring and Metrics​

Usage Tracking​

Provider Statistics​

Health Monitoring​

Custom Provider Implementation​

Basic Custom Provider​

Registering Custom Providers​

Best Practices​

Configuration Management​

Error Handling​

Performance Optimization​

Provider Selection​

Migration Guide​

From Direct Provider APIs​

From Other LLM Libraries​

Troubleshooting​

Common Issues​

See Also​

Overview

Core Components

LLMManager

Provider Interface

Configuration Manager

Quick Start

Basic Usage

With Configuration

Streaming Responses

Supported Providers

OpenAI

Anthropic (Claude)

Google (Gemini)

DeepSeek

OpenRouter

Advanced Patterns

Provider Fallback

Load Balancing

Tool Integration

Batch Operations

Configuration

Environment Variables

JSON Configuration

Runtime Configuration

Response Format

Error Handling

Structured Error Types

Automatic Recovery

Monitoring and Metrics

Usage Tracking

Provider Statistics

Health Monitoring

Custom Provider Implementation

Basic Custom Provider

Registering Custom Providers

Best Practices

Configuration Management

Error Handling

Performance Optimization

Provider Selection

Migration Guide

From Direct Provider APIs

From Other LLM Libraries

Troubleshooting

Common Issues

See Also