Prediction Service
Technical specification for the unified Prediction Service including the inference pipeline, caching strategy, explainability features, and performance monitoring.
Prediction Service
The Prediction Service (prediction-service.ts) provides a unified interface for making predictions across all registered AI models. It handles model resolution, provider invocation, caching, monitoring, and error handling.
1. Prediction Pipeline
1.1 Request Flow
sequenceDiagram
participant Client as Business Module
participant PS as Prediction Service
participant MR as Model Registry
participant CM as Cache Manager
participant PF as Provider Factory
participant ML as ML Provider
participant PM as Performance Monitor
participant ES as Explainability Service
Client->>PS: predict(request)
PS->>MR: getModel(modelId)
MR-->>PS: ModelConfig
PS->>PS: selectModelForABTest()
PS->>CM: get(cacheKey)
alt Cache Hit
CM-->>PS: cached result
PS->>PM: recordPrediction(cached=true)
PS-->>Client: PredictionResponse
else Cache Miss
PS->>PF: getProvider(config)
PF-->>PS: provider instance
PS->>ML: predict(modelId, features)
ML-->>PS: raw prediction
PS->>CM: set(cacheKey, result)
PS->>PM: recordPrediction(cached=false)
PS-->>Client: PredictionResponse
end
opt Explainability Requested
Client->>ES: explainPrediction()
ES-->>Client: ExplainabilityResult
end1.2 Request Interface
interface PredictionRequest {
modelId: string; // Model to invoke
features: Record<string, any>; // Input feature map
context?: { // Optional metadata
userId?: string;
objectType?: string;
objectId?: string;
};
useCache?: boolean; // Default: true
forceProvider?: BaseMLProvider; // Override default provider
}1.3 Response Interface
interface PredictionResponse<T = any> {
prediction: T; // Model output
confidence: number; // Confidence score (0–100)
modelId: string; // Model used (may differ due to A/B test)
modelVersion: string; // Model version
processingTime: number; // Latency in milliseconds
cached: boolean; // Whether result was from cache
metadata?: Record<string, any>;
}1.4 Pipeline Steps
- Model Resolution: Fetch model config from registry, verify
status === 'active'. - A/B Test Selection: If A/B testing is enabled, randomly select champion or challenger model based on traffic percentage.
- Cache Check: Generate deterministic cache key from
modelId+ serializedfeatures. Return cached result if available. - Provider Invocation: Route to configured ML provider via
ProviderFactory, or fall back to mock predictions. - Response Assembly: Package prediction with confidence, timing, and metadata.
- Cache Store: Cache response with configurable TTL (default: 5 minutes).
- Metrics Recording: Record prediction to Performance Monitor (latency, confidence, success/failure, cache status).
1.5 Batch Predictions
PredictionService.batchPredict<T>(
modelId: string,
features: Array<Record<string, any>>,
useCache?: boolean
): Promise<Array<PredictionResponse<T>>>Batch predictions first attempt the provider's native batchPredict() method. If unavailable or failing, falls back to parallel individual predictions via Promise.all.
2. Caching Strategy
2.1 Architecture
The Cache Manager (cache-manager.ts) implements a dual-layer caching strategy:
| Layer | Backend | Use Case |
|---|---|---|
| Primary | Redis | Production environments with Redis configured |
| Fallback | In-Memory Map | Development, testing, or when Redis is unavailable |
2.2 Configuration
interface CacheConfig {
redisUrl?: string; // Redis connection URL
defaultTtl?: number; // Default TTL in seconds (default: 300)
enabled?: boolean; // Enable/disable caching (default: true)
useMemoryFallback?: boolean; // Fall back to memory if Redis fails (default: true)
}2.3 Cache Key Generation
Cache keys are deterministic, combining model ID and input features:
pred:{modelId}:{JSON.stringify(features)}This ensures identical inputs to the same model always produce a cache hit within the TTL window.
2.4 TTL Management
- Default TTL: 300 seconds (5 minutes).
- Per-prediction TTL can be specified when calling
set(). - Expired entries are cleaned up probabilistically (10% chance on each
set()call) to avoid performance overhead.
2.5 Cache Statistics
cacheManager.getStats(): {
size: number; // Number of cached entries
hits: number; // Total cache hits
backend: 'redis' | 'memory';
}3. Explainability Features
3.1 Explainability Service
The Explainability Service (explainability-service.ts) provides SHAP-like feature attribution for any prediction:
ExplainabilityService.explainPrediction(
modelId: string,
features: Record<string, any>,
prediction: any,
confidence: number
): Promise<ExplainabilityResult>3.2 Feature Contributions
Each input feature receives a contribution score:
| Field | Type | Description |
|---|---|---|
feature | string | Feature name |
value | any | Input value |
contribution | number | Impact on prediction (–1 to +1) |
importance | number | Absolute importance (0–100%) |
3.3 Explanation Generation
The service generates human-readable explanations that list positive and negative contributing factors:
The prediction was influenced by:
Positive factors:
• Engagement Score: 85 (impact: +35%)
• Company Size: 500 (impact: +25%)
Negative factors:
• Budget: 0 (impact: -10%)3.4 Prediction Comparison
Compare two sets of features to understand what drives different outcomes:
ExplainabilityService.comparePredictions(
modelId: string,
features1: Record<string, any>,
features2: Record<string, any>
): Promise<{ differences: Array<...>; explanation: string }>3.5 Model-Specific Weights
The service maintains feature importance weights per model:
- Lead Scoring:
engagement_score(35%),company_size(25%),industry(15%),job_title(15%),budget(10%) - Churn Prediction:
support_tickets(30%),usage_frequency(25%),account_age(20%),nps_score(15%),payment_delays(10%)
In production, these weights are sourced from actual model SHAP values.
4. Performance Monitoring
4.1 Metric Collection
Every prediction (successful, failed, cached) generates a PredictionMetric:
interface PredictionMetric {
modelId: string;
timestamp: number;
latency: number; // Processing time (ms)
confidence: number; // Prediction confidence
cached: boolean; // Cache hit/miss
success: boolean; // Success/failure
error?: string; // Error message if failed
provider?: string; // Provider used
}4.2 Performance Statistics
The Performance Monitor aggregates metrics per model:
| Metric | Description |
|---|---|
totalPredictions | Total prediction count |
successfulPredictions | Successful prediction count |
failedPredictions | Failed prediction count |
averageLatency | Mean latency (ms) |
medianLatency | Median latency (ms) |
p95Latency | 95th percentile latency (ms) |
p99Latency | 99th percentile latency (ms) |
averageConfidence | Mean confidence score |
cacheHitRate | Percentage of cached responses |
errorRate | Percentage of failed predictions |
Statistics can be filtered by time window (e.g., last 5 minutes, last hour).
4.3 Health Status
The monitor provides a three-tier health assessment per model:
| Status | Criteria |
|---|---|
| Healthy | Error rate < 5%, P95 latency < 500ms |
| Degraded | Error rate 5–10% OR P95 latency > 500ms |
| Unhealthy | Error rate > 10% |
PerformanceMonitor.getHealthStatus(modelId: string): {
status: 'healthy' | 'degraded' | 'unhealthy';
reason?: string;
}4.4 Data Retention
The monitor retains up to 10,000 metrics per model using a FIFO strategy. Older metrics are automatically evicted when the limit is reached.