Skip to main content

Contextual Routing and Auto-Correction

Cascading Contextual Routing

l3mcore evaluates the context of the conversation to maintain thematic continuity in multi-turn chats. This system is 100% stateless: it does not require a database, nor sessions, nor RAM usage per user.

It works with any number of simultaneous users because each request carries its own context through the standard messages array of OpenAI. OpenWebUI, Continue.dev and any other compatible client already send the complete history of the conversation in each request.

Decision flow

Request with messages[]
|
v
Step 1: Evaluate user's last message
| confidence >= threshold? --> Use that expert
| confidence < threshold
v
Step 2: Concatenate user's last 2-3 messages
| confidence >= threshold? --> Use that expert
| confidence < threshold
v
Step 3: Fallback model

Step 1: Last message

The router evaluates only the user's last message. If the confidence score exceeds the confidence_threshold configured in config.json, the expert is selected immediately.

This allows the user to change the subject in the middle of a chat. If they say "Write me a rental contract" after 10 messages talking about Python, the router will detect the change and route to the legal expert.

Step 2: Expanded context

If the last message has low confidence (for example, "Make it shorter" or "Explain the second point to me"), the router concatenates the user's last N messages (configurable with context_messages, default 3) and re-evaluates.

By including previous messages like "Make me a Python script that reads a CSV", the router obtains enough context to deduce that the conversation is still about programming.

The concatenated text is truncated to context_max_chars (default 1600 characters) to stay within the 512-token window of the embeddings model.

Step 3: Fallback

If no step produces a match with sufficient confidence, the request is sent to the fallback model (ID 0).

Configuration

These parameters are configured in the router section of config/config.json:

ParameterTypeDefaultDescription
context_messagesint3Number of recent user messages to concatenate in Step 2.
context_max_charsint1600Character limit of the concatenated text.

Administrator logs

The console shows which step of the cascade resolved each request:

INFO - [Cascade] Step 1: 'Make me a Python script' -> programador_python (0.92)
INFO - [Cascade] Step 1: 'Make it shorter' -> score 0.15 below threshold (0.4). Escalating to Step 2.
INFO - [Cascade] Step 2: 'Make me a Python script...Make it shorter' -> programador_python (0.87)

Silent Auto-Correction

If an expert fails during inference for any reason (timeout, connection refused, API down, internal model error), the system intercepts the error and automatically redirects the request to the fallback model.

Behavior for the end user

The user receives a normal response generated by the fallback model. No error message or indication that an internal failure occurred is displayed. The experience is completely transparent.

Behavior for the administrator

Auto-correction events are logged in the console and in the server logs with the prefix [Auto-Correction]:

ERROR - [Auto-Correction] Routed expert 'programador_python' failed: Connection refused. Redirecting to fallback.
ERROR - [Auto-Correction] Explicit expert 'asesor_legal' failed: Timeout after 30s. Redirecting to fallback.

Coverage

Auto-correction covers all routing phases:

PhaseDescription
Forced pluginIf a plugin forces a route and the expert fails
Explicit expertIf the user selects a specific model and it fails
Semantic routerIf the router chooses an expert and it fails
FallbackIf the fallback itself fails, a generic error message is returned
warning

If the fallback model also fails (extremely rare case, for example a catastrophic disk or VRAM failure), the user will receive a generic error message. The detail of the error is never exposed to the end user; it is only recorded in the server logs.