Skip to main content

LLM System API Reference

The LLM (Large Language Model) system in SpoonOS provides a unified, provider-agnostic interface for working with multiple AI services including OpenAI, Anthropic, Google, and DeepSeek.

Overview​

SpoonOS's LLM system offers:

  • Provider Agnosticism: Unified API across all providers
  • Automatic Fallback: Intelligent provider switching on failures
  • Load Balancing: Distribute requests across multiple providers
  • Comprehensive Monitoring: Usage tracking and performance metrics
  • Flexible Configuration: Multiple configuration sources and validation
  • Advanced Features: Streaming, function calling, and tool integration

Core Components​

LLMManager​

Central orchestrator for LLM operations with provider management, fallback, and load balancing.

Key Features:

  • Unified chat and generation API
  • Automatic provider fallback
  • Load balancing and health monitoring
  • Streaming and batch operations
  • Comprehensive error handling
from spoon_ai.llm import LLMManager

llm_manager = LLMManager()
response = await llm_manager.chat(messages)

Provider Interface​

Abstract interface that all LLM providers implement for consistent behavior.

Key Features:

  • Standardized provider contract
  • Comprehensive capability system
  • Unified response format
  • Built-in error handling patterns
from spoon_ai.llm import LLMProviderInterface

class CustomProvider(LLMProviderInterface):
async def chat(self, messages, **kwargs) -> LLMResponse:
# Implementation

Configuration Manager​

Handles configuration loading, validation, and management from multiple sources.

Key Features:

  • Multi-source configuration (files, env vars, runtime)
  • Provider-specific validation
  • Secure credential management
  • Configuration templates and merging
from spoon_ai.llm import ConfigurationManager

config_manager = ConfigurationManager("config.json")
config_manager.set_provider_config("openai", {...})

Quick Start​

Basic Usage​

from spoon_ai.llm import LLMManager
from spoon_ai.schema import Message

# Initialize manager
llm_manager = LLMManager()

# Simple chat
messages = [Message(role="user", content="Hello!")]
response = await llm_manager.chat(messages)
print(response.content)

With Configuration​

from spoon_ai.llm import ConfigurationManager, LLMManager

# Load configuration
config_manager = ConfigurationManager("config.json")
llm_manager = LLMManager(config_manager=config_manager)

# Chat with specific provider
response = await llm_manager.chat(messages, provider="openai")

Streaming Responses​

# Stream responses for real-time output
async for chunk in llm_manager.chat_stream(messages):
print(chunk, end="", flush=True)

Supported Providers​

OpenAI​

  • Models: GPT-4.1, GPT-4o, GPT-4o-mini, o1-preview, o1-mini
  • Features: Function calling, streaming, embeddings
  • Best for: General-purpose tasks, reasoning, code generation

Anthropic (Claude)​

  • Models: Claude-Sonnet-4-20250514, Claude-3.5 Sonnet, Claude-3.5 Haiku
  • Features: Large context windows, prompt caching, safety
  • Best for: Long documents, analysis, safety-critical applications

Google (Gemini)​

  • Models: Gemini-2.5-Pro, Gemini-2.0-Flash, Gemini-1.5-Pro
  • Features: Multimodal, fast inference, large context
  • Best for: Multimodal tasks, cost-effectiveness, long context

DeepSeek​

  • Models: DeepSeek-Reasoner, DeepSeek-V3, DeepSeek-Chat
  • Features: Advanced reasoning, code-specialized, cost-effective
  • Best for: Complex reasoning, code generation, technical tasks

OpenRouter​

  • Models: Access to multiple providers through single API
  • Features: Model routing, cost optimization
  • Best for: Experimentation, cost optimization

Advanced Patterns​

Provider Fallback​

# Configure automatic fallback
llm_manager = LLMManager()
llm_manager.set_primary_provider("openai")
llm_manager.add_fallback_provider("anthropic")
llm_manager.add_fallback_provider("deepseek")

# Automatic fallback on failures
response = await llm_manager.chat(messages)
print(f"Used provider: {response.provider}")

Load Balancing​

# Weighted load balancing
llm_manager.set_load_balancing_strategy("weighted")
llm_manager.set_provider_weight("openai", 0.6)
llm_manager.set_provider_weight("anthropic", 0.4)

# Requests distributed by weights
response = await llm_manager.chat(messages)

Tool Integration​

# Function calling with tools
tools = [
{
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}}
}
}
]

response = await llm_manager.chat_with_tools(messages, tools)
for tool_call in response.tool_calls:
# Execute tool calls
result = await execute_tool(tool_call)

Batch Operations​

# Process multiple requests efficiently
batch_messages = [
[Message(role="user", content="Summarize text A")],
[Message(role="user", content="Translate text B")],
[Message(role="user", content="Analyze text C")]
]

responses = await llm_manager.batch_chat(batch_messages)
for i, response in enumerate(responses):
print(f"Response {i+1}: {response.content[:50]}...")

Configuration​

Environment Variables​

# API Keys (required)
OPENAI_API_KEY=sk-your_key
ANTHROPIC_API_KEY=sk-ant-your_key
GOOGLE_API_KEY=your_key
DEEPSEEK_API_KEY=your_key

# Global Settings (optional)
DEFAULT_LLM_PROVIDER=openai
DEFAULT_MODEL=gpt-4.1
DEFAULT_TEMPERATURE=0.3
LLM_TIMEOUT=30
LLM_RETRY_ATTEMPTS=3

JSON Configuration​

{
"llm": {
"default_provider": "openai",
"timeout": 30,
"providers": {
"openai": {
"api_key": "sk-...",
"model": "gpt-4.1",
"temperature": 0.7
},
"anthropic": {
"api_key": "sk-ant-...",
"model": "claude-sonnet-4-20250514"
}
}
}
}

Runtime Configuration​

from spoon_ai.llm import ConfigurationManager

config_manager = ConfigurationManager()

# Configure providers
config_manager.set_provider_config("openai", {
"api_key": "sk-...",
"model": "gpt-4.1",
"temperature": 0.7
})

# Set global settings
config_manager.set_global_config({
"default_provider": "openai",
"timeout": 30
})

Response Format​

All LLM operations return a standardized LLMResponse:

@dataclass
class LLMResponse:
content: str # Generated text
provider: str # Provider used
model: str # Model used
finish_reason: str # Why generation stopped
native_finish_reason: str # Provider-specific reason
tool_calls: List[ToolCall] # Function calls (if any)
usage: Dict[str, int] # Token usage statistics
metadata: Dict[str, Any] # Additional metadata
request_id: str # Unique request ID
duration: float # Request duration
timestamp: datetime # Request time

Error Handling​

Structured Error Types​

from spoon_ai.llm.errors import (
LLMError, # Base LLM error
ProviderError, # Provider-specific errors
ConfigurationError, # Configuration issues
RateLimitError, # Rate limiting
AuthenticationError, # Auth failures
ModelNotFoundError, # Invalid model
TokenLimitError, # Token limit exceeded
NetworkError, # Network issues
ProviderUnavailableError # Provider down
)

try:
response = await llm_manager.chat(messages)
except RateLimitError:
# Handle rate limiting
await asyncio.sleep(60)
response = await llm_manager.chat(messages)
except AuthenticationError:
# Handle auth issues
print("API key invalid")
except ProviderError as e:
print(f"Provider {e.provider} failed: {e.message}")

Automatic Recovery​

# Framework handles most errors automatically
llm_manager = LLMManager()

# Automatic retry with backoff
# Automatic fallback to other providers
# Automatic rate limit handling

response = await llm_manager.chat(messages) # Robust by default

Monitoring and Metrics​

Usage Tracking​

# Get comprehensive metrics
metrics = llm_manager.get_metrics()
print(f"Total requests: {metrics['total_requests']}")
print(f"Success rate: {metrics['success_rate']}%")
print(f"Total tokens: {metrics['total_tokens']}")
print(f"Total cost: ${metrics['total_cost']}")

Provider Statistics​

# Per-provider metrics
stats = llm_manager.get_provider_stats()
for provider, data in stats.items():
print(f"{provider}: {data['requests']} requests, {data['errors']} errors")

Health Monitoring​

# Check provider health
health = await llm_manager.health_check()
print(f"Healthy providers: {health['healthy_providers']}")

# Individual provider health
health = await llm_manager.health_check("openai")
print(f"OpenAI healthy: {health['healthy']}")

Custom Provider Implementation​

Basic Custom Provider​

from spoon_ai.llm import LLMProviderInterface, LLMResponse

class MyCustomProvider(LLMProviderInterface):
async def initialize(self, config: Dict[str, Any]) -> None:
self.api_key = config["api_key"]

async def chat(self, messages, **kwargs) -> LLMResponse:
# Your implementation
response = await self._call_api(messages, **kwargs)
return LLMResponse(
content=response["content"],
provider="custom",
model="my-model",
finish_reason="stop",
native_finish_reason="stop",
tool_calls=[],
usage=response.get("usage"),
metadata={},
request_id="custom-123",
duration=response.get("duration", 0.0)
)

def get_metadata(self) -> ProviderMetadata:
return ProviderMetadata(
name="custom",
version="1.0",
capabilities=[ProviderCapability.CHAT],
max_tokens=4096,
supports_system_messages=True
)

async def health_check(self) -> bool:
try:
# Test API connectivity
return True
except:
return False

async def cleanup(self) -> None:
pass

# Implement other required methods...

Registering Custom Providers​

from spoon_ai.llm import register_provider

# Register your custom provider
register_provider("my_provider", MyCustomProvider)

# Now you can use it
llm_manager.set_primary_provider("my_provider")

Best Practices​

Configuration Management​

  • Store API keys securely in environment variables
  • Use configuration files for complex setups
  • Validate configurations before deployment
  • Use different configs for dev/staging/production

Error Handling​

  • Let the framework handle common errors automatically
  • Use specific error types for custom logic
  • Implement proper fallback chains
  • Monitor error rates and patterns

Performance Optimization​

  • Use streaming for real-time applications
  • Batch requests when possible
  • Monitor token usage and costs
  • Cache responses when appropriate

Provider Selection​

  • Test multiple providers for your use case
  • Consider cost vs. quality trade-offs
  • Use fallbacks for production reliability
  • Monitor provider performance regularly

Migration Guide​

From Direct Provider APIs​

# Before: Direct OpenAI API
import openai
client = openai.OpenAI(api_key="sk-...")
response = client.chat.completions.create(...)

# After: SpoonOS LLM Manager
from spoon_ai.llm import LLMManager
llm_manager = LLMManager()
response = await llm_manager.chat(messages) # Automatic provider selection

From Other LLM Libraries​

# Before: LangChain
from langchain.llms import OpenAI
llm = OpenAI(model="gpt-4", temperature=0.7)

# After: SpoonOS
from spoon_ai.llm import LLMManager
llm_manager = LLMManager()
llm_manager.configure_provider("openai", {
"model": "gpt-4.1",
"temperature": 0.7
})

Troubleshooting​

Common Issues​

Provider Connection Failed

# Check API keys
health = await llm_manager.health_check("openai")
if not health["healthy"]:
print(f"Error: {health.get('error')}")

# Verify configuration
config = llm_manager.get_provider_config("openai")
print(f"API Key configured: {bool(config.api_key)}")

Rate Limiting

# Increase timeout and retry settings
llm_manager.set_retry_policy(max_attempts=5, backoff_factor=2.0)
llm_manager.set_timeout(60)

# Use multiple providers to distribute load
llm_manager.add_fallback_provider("anthropic")

High Latency

# Enable monitoring to identify bottlenecks
llm_manager.enable_monitoring(["execution_time", "success_rate"])

# Check metrics
metrics = llm_manager.get_metrics()
print(f"Average latency: {metrics['avg_latency']}s")

# Consider faster providers or models
llm_manager.set_primary_provider("gemini") # Generally faster

Configuration Errors

from spoon_ai.llm import ConfigurationManager

config_manager = ConfigurationManager()
errors = config_manager.validate_config(your_config)
for error in errors:
print(f"Config error: {error}")

See Also​