Contextual Routing and Auto-Correction
Cascading Contextual Routing
l3mcore evaluates the context of the conversation to maintain thematic continuity in multi-turn chats. This system is 100% stateless: it does not require a database, nor sessions, nor RAM usage per user.
It works with any number of simultaneous users because each request carries its own context through the standard messages array of OpenAI. OpenWebUI, Continue.dev and any other compatible client already send the complete history of the conversation in each request.
Decision flow
Request with messages[]
|
v
Step 1: Evaluate user's last message
| confidence >= threshold? --> Use that expert
| confidence < threshold
v
Step 2: Concatenate user's last 2-3 messages
| confidence >= threshold? --> Use that expert
| confidence < threshold
v
Step 3: Fallback model
Step 1: Last message
The router evaluates only the user's last message. If the confidence score exceeds the confidence_threshold configured in config.json, the expert is selected immediately.
This allows the user to change the subject in the middle of a chat. If they say "Write me a rental contract" after 10 messages talking about Python, the router will detect the change and route to the legal expert.
Step 2: Expanded context
If the last message has low confidence (for example, "Make it shorter" or "Explain the second point to me"), the router concatenates the user's last N messages (configurable with context_messages, default 3) and re-evaluates.
By including previous messages like "Make me a Python script that reads a CSV", the router obtains enough context to deduce that the conversation is still about programming.
The concatenated text is truncated to context_max_chars (default 1600 characters) to stay within the 512-token window of the embeddings model.
Step 3: Fallback
If no step produces a match with sufficient confidence, the request is sent to the fallback model (ID 0).
Configuration
These parameters are configured in the router section of config/config.json:
| Parameter | Type | Default | Description |
|---|---|---|---|
context_messages | int | 3 | Number of recent user messages to concatenate in Step 2. |
context_max_chars | int | 1600 | Character limit of the concatenated text. |
Administrator logs
The console shows which step of the cascade resolved each request:
INFO - [Cascade] Step 1: 'Make me a Python script' -> programador_python (0.92)
INFO - [Cascade] Step 1: 'Make it shorter' -> score 0.15 below threshold (0.4). Escalating to Step 2.
INFO - [Cascade] Step 2: 'Make me a Python script...Make it shorter' -> programador_python (0.87)
Silent Auto-Correction
If an expert fails during inference for any reason (timeout, connection refused, API down, internal model error), the system intercepts the error and automatically redirects the request to the fallback model.
Behavior for the end user
The user receives a normal response generated by the fallback model. No error message or indication that an internal failure occurred is displayed. The experience is completely transparent.
Behavior for the administrator
Auto-correction events are logged in the console and in the server logs with the prefix [Auto-Correction]:
ERROR - [Auto-Correction] Routed expert 'programador_python' failed: Connection refused. Redirecting to fallback.
ERROR - [Auto-Correction] Explicit expert 'asesor_legal' failed: Timeout after 30s. Redirecting to fallback.
Coverage
Auto-correction covers all routing phases:
| Phase | Description |
|---|---|
| Forced plugin | If a plugin forces a route and the expert fails |
| Explicit expert | If the user selects a specific model and it fails |
| Semantic router | If the router chooses an expert and it fails |
| Fallback | If the fallback itself fails, a generic error message is returned |
If the fallback model also fails (extremely rare case, for example a catastrophic disk or VRAM failure), the user will receive a generic error message. The detail of the error is never exposed to the end user; it is only recorded in the server logs.