API documentation
lincorelink speaks the OpenAI wire protocol. Point any OpenAI SDK at our base URL, swap in your key, and you are done — no new client, no rewrites.
Introduction
lincorelink is an OpenAI-compatible API gateway. It authenticates your request, meters your usage, and relays the call to a third-party large-language-model provider — currently DeepSeek V4 — then streams the response back to you. Because the wire format matches the OpenAI Chat Completions API, existing OpenAI SDKs and tools work without changes.
Base URL
https://api.lincorelink.ai/v1Endpoints
| Method | Path | Description |
|---|---|---|
POST | /v1/chat/completions | Create a model response for a chat conversation (streaming or non-streaming). |
GET | /v1/models | List the model IDs available to your account. |
Authentication
Authenticate every request with your API key in the Authorization header as a Bearer token. Create and manage keys in your dashboard. Keys look like sk-lcl-… and are shown in full only once, at creation — store them securely, because we keep only a hash and cannot recover a key later.
Authorization: Bearer sk-lcl-...A missing or invalid key returns 401 with code invalid_api_key. Keep keys server-side; never embed them in browser, mobile, or other client-side code. If a key is exposed, revoke it in the dashboard — revocation takes effect immediately.
Quickstart
Send your first request with cURL or any OpenAI SDK.
cURL
curl https://api.lincorelink.ai/v1/chat/completions \
-H "Authorization: Bearer $LINCORELINK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-pro",
"messages": [{ "role": "user", "content": "Hello!" }]
}'Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.lincorelink.ai/v1",
api_key="sk-lcl-...", # your lincorelink key
)
resp = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Hello!"},
],
)
print(resp.choices[0].message.content)Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.lincorelink.ai/v1",
apiKey: process.env.LINCORELINK_API_KEY,
});
const resp = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(resp.choices[0].message.content);Chat completions
POST /v1/chat/completions accepts the standard OpenAI Chat Completions body. The most common parameters:
| Parameter | Type | Description |
|---|---|---|
model | string | Required. One of deepseek-v4-flash or deepseek-v4-pro. |
messages | array | Required. A non-empty list of { role, content } messages (system, user, assistant). |
max_tokens | integer | Maximum tokens to generate. Clamped to your plan's output cap (see Rate limits & tiers). |
stream | boolean | If true, tokens are sent as Server-Sent Events. See Streaming. |
temperature, top_p | number | Sampling controls, passed through to the provider. |
stop, frequency_penalty, presence_penalty | various | Standard OpenAI parameters, passed through to the provider. |
Response
A non-streaming call returns a chat.completion object. The usage object includes cached vs non-cached input tokens, which is what we bill on (see Pricing & metering).
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1734220000,
"model": "deepseek-v4-pro",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello! How can I help?" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 7,
"total_tokens": 19,
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 12
}
}Streaming
Set stream: true to receive tokens incrementally as Server-Sent Events, exactly like the OpenAI API. Each event is a chat.completion.chunk with a delta; the stream ends with a data: [DONE] line.
To receive token usage at the end of a stream, set stream_options.include_usage = true; a final frame then carries the usage object. If you do not request it, we strip our internal usage frame so the stream stays byte-for-byte like a native OpenAI stream.
stream = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Write a haiku about the edge"}],
stream=True,
stream_options={"include_usage": True}, # optional: get a final usage frame
)
for chunk in stream:
if chunk.choices:
print(chunk.choices[0].delta.content or "", end="")Models
| Model ID | Best for | Notes |
|---|---|---|
deepseek-v4-flash | High-volume, latency-sensitive, cheap | Available on every plan, including Basic. |
deepseek-v4-pro | Harder reasoning, agents, code | Available on Pro and Max plans. |
Fetch the live list any time with GET /v1/models. More models and providers join the same key and balance over time.
Pricing & metering
Billing is prepaid and metered per token. You are charged on the usage the provider reports, with cached input billed far cheaper than fresh (non-cached) input. Rates per 1M tokens (USD):
| Model | Input | Cached input | Output |
|---|---|---|---|
deepseek-v4-flash | $0.16 | $0.004 | $0.30 |
deepseek-v4-pro | $0.60 | $0.005 | $1.20 |
Each request reserves an estimated maximum from your balance, then settles to the actual metered cost when it completes. A request is rejected up front with 402 if your balance cannot cover the estimate. See the pricing page for plans and the latest rates, which may change with notice.
Rate limits & tiers
Per-request context and output caps, concurrency, and any daily token cap depend on your plan. A request whose input exceeds the context cap is rejected before it reaches the provider (413 context_length_exceeded), and max_tokens is clamped to your output cap.
| Plan | Context | Max output | Concurrency | Daily tokens |
|---|---|---|---|---|
| Basic (free) | 128K | 32K | 10 | 100M |
| Pro | 256K | 32K | 20 | Unlimited |
| Max | 1M | 64K | 20 | Unlimited |
When you exceed a limit you receive 429 with a Retry-After header — wait that many seconds, then retry with backoff.
Errors
Errors use the OpenAI shape, so existing error handling works unchanged: { "error": { "message", "type", "param", "code" } }.
{
"error": {
"message": "Insufficient balance. Top up to continue.",
"type": "insufficient_quota",
"param": null,
"code": "insufficient_funds"
}
}| Status | code | Meaning |
|---|---|---|
400 | model_not_found | Malformed request, or an unknown model ID. |
401 | invalid_api_key | Missing or invalid API key. |
402 | insufficient_funds | Balance cannot cover the request. Top up to continue. |
403 | model_not_available | The model is not included in your plan (e.g. Pro model on Basic). |
413 | context_length_exceeded | Input exceeds your plan's context cap. |
429 | rate_limit_error | Too many requests / concurrency or daily cap reached. Honour Retry-After. |
5xx | api_error | Upstream/provider error. Safe to retry with backoff. |
Best practices
- Keep keys server-side and rotate them periodically. Revoke immediately if one leaks.
- Retry on 429 and 5xx with exponential backoff, and honour the
Retry-Afterheader when present. - Reuse system prompts and prefixes to benefit from cached-input pricing, which is roughly 50× cheaper than fresh input.
- Set
max_tokensdeliberately so reservations against your balance stay tight and predictable. - Stream long responses for better perceived latency, and request
include_usageif you need token counts.
Need a key? Create an account, top up a few dollars, and make your first call in minutes.
