Model Aliases
Model aliases provide semantic, version-controlled names for your models, enabling cleaner client code, easier model management, and advanced routing capabilities. Instead of using provider-specific model names like gpt-4o-mini
or claude-3-5-sonnet-20241022
, you can create meaningful aliases like fast-model
or arch.summarize.v1
.
Benefits of Model Aliases:
Semantic Naming: Use descriptive names that reflect the model’s purpose
Version Control: Implement versioning schemes (e.g.,
v1
,v2
) for model upgradesEnvironment Management: Different aliases can point to different models across environments
Client Simplification: Clients use consistent, meaningful names regardless of underlying provider
Advanced Routing (Coming Soon): Enable guardrails, fallbacks, and traffic splitting at the alias level
Basic Configuration
Simple Alias Mapping
llm_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
- model: anthropic/claude-3-5-sonnet-20241022
access_key: $ANTHROPIC_API_KEY
- model: ollama/llama3.1
base_url: http://host.docker.internal:11434
# Define aliases that map to the models above
model_aliases:
# Semantic versioning approach
arch.summarize.v1:
target: gpt-4o-mini
arch.reasoning.v1:
target: gpt-4o
arch.creative.v1:
target: claude-3-5-sonnet-20241022
# Functional aliases
fast-model:
target: gpt-4o-mini
smart-model:
target: gpt-4o
creative-model:
target: claude-3-5-sonnet-20241022
# Local model alias
local-chat:
target: llama3.1
Using Aliases
Client Code Examples
Once aliases are configured, clients can use semantic names instead of provider-specific model names:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:12000/")
# Use semantic alias instead of provider model name
response = client.chat.completions.create(
model="arch.summarize.v1", # Points to gpt-4o-mini
messages=[{"role": "user", "content": "Summarize this document..."}]
)
# Switch to a different capability
response = client.chat.completions.create(
model="arch.reasoning.v1", # Points to gpt-4o
messages=[{"role": "user", "content": "Solve this complex problem..."}]
)
Naming Best Practices
Semantic Versioning
Use version numbers for backward compatibility and gradual model upgrades:
model_aliases:
# Current production version
arch.summarize.v1:
target: gpt-4o-mini
# Beta version for testing
arch.summarize.v2:
target: gpt-4o
# Stable alias that always points to latest
arch.summarize.latest:
target: gpt-4o-mini
Purpose-Based Naming
Create aliases that reflect the intended use case:
model_aliases:
# Task-specific
code-reviewer:
target: gpt-4o
document-summarizer:
target: gpt-4o-mini
creative-writer:
target: claude-3-5-sonnet-20241022
data-analyst:
target: gpt-4o
Environment-Specific Aliases
Different environments can use different underlying models:
model_aliases:
# Development environment - use faster/cheaper models
dev.chat.v1:
target: gpt-4o-mini
# Production environment - use more capable models
prod.chat.v1:
target: gpt-4o
# Staging environment - test new models
staging.chat.v1:
target: claude-3-5-sonnet-20241022
Advanced Features (Coming Soon)
The following features are planned for future releases of model aliases:
Guardrails Integration
Apply safety, cost, or latency rules at the alias level:
model_aliases:
arch.reasoning.v1:
target: gpt-oss-120b
guardrails:
max_latency: 5s
max_cost_per_request: 0.10
block_categories: ["jailbreak", "PII"]
content_filters:
- type: "profanity"
- type: "sensitive_data"
Fallback Chains
Provide a chain of models if the primary target fails or hits quota limits:
model_aliases:
arch.summarize.v1:
target: gpt-4o-mini
fallbacks:
- target: llama3.1
conditions: ["quota_exceeded", "timeout"]
- target: claude-3-haiku-20240307
conditions: ["primary_and_first_fallback_failed"]
Traffic Splitting & Canary Deployments
Distribute traffic across multiple models for A/B testing or gradual rollouts:
model_aliases:
arch.v1:
targets:
- model: llama3.1
weight: 80
- model: gpt-4o-mini
weight: 20
# Canary deployment
arch.experimental.v1:
targets:
- model: gpt-4o # Current stable
weight: 95
- model: o1-preview # New model being tested
weight: 5
Load Balancing
Distribute requests across multiple instances of the same model:
model_aliases:
high-throughput-chat:
load_balance:
algorithm: "round_robin" # or "least_connections", "weighted"
targets:
- model: gpt-4o-mini
endpoint: "https://api-1.example.com"
- model: gpt-4o-mini
endpoint: "https://api-2.example.com"
- model: gpt-4o-mini
endpoint: "https://api-3.example.com"
Validation Rules
Alias names must be valid identifiers (alphanumeric, dots, hyphens, underscores)
Target models must be defined in the
llm_providers
sectionCircular references between aliases are not allowed
Weights in traffic splitting must sum to 100
See Also
LLM Providers - Learn about configuring LLM providers
LLM Routing - Understand how aliases work with intelligent routing