Skip to main content

API Endpoints

l3mcore listens on http://0.0.0.0:11435 by default and exposes endpoints compatible with the OpenAI and Ollama APIs.


OpenAI Endpoints

POST /v1/chat/completions

Generates a chat response. Compatible with any OpenAI client.

Request:

{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a bash script to list processes."}
],
"stream": true
}
Optional model field
  • Omitted or generic (e.g. "gpt-4"): l3mcore analyzes the prompt and automatically routes to the best expert.
  • Specific label (e.g. "programmer"): skips the router and goes straight to the indicated expert.

Response (without streaming):

{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "programmer",
"choices": [{
"message": {
"role": "assistant",
"content": "#!/bin/bash\nps aux | sort -rk 3,3 | head -20"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 12,
"total_tokens": 27
}
}

GET /v1/models

Lists all available experts in OpenAI format.

Response:

{
"object": "list",
"data": [
{"id": "programmer", "object": "model", "created": 1700000000, "owned_by": "lemoe"},
{"id": "creative_writer", "object": "model", "created": 1700000000, "owned_by": "lemoe"}
]
}

Ollama Endpoints

POST /api/chat

Compatible with Ollama clients.

{
"model": "lemoe",
"messages": [
{"role": "user", "content": "Explain quantum computing."}
],
"stream": false
}

GET /api/tags

Lists experts in Ollama format.


GET /api/version

Returns the server version:

{"version": "1.0.0"}

Server Limits

LimitValue
Rate limit60 requests/minute per IP
Max payload size1 MB
StreamingSSE compatible
Behind a reverse proxy

The rate limiter respects the X-Forwarded-For header, so it works correctly behind Nginx, Traefik, or Caddy.