Run translations locally with LM Studio

Most words you click in Lector never touch the network. They resolve straight from an on-device dictionary. But that dictionary is small, so plenty of lookups still fall through to a cloud LLM, and in a thinly-resourced language like Afrikaans, plenty is an understatement. Each one sends a slice of what you're reading to someone else's server, and adds to a per-token bill.

Lector's click-to-translate, phrase translation, journal correction, and LLM tutor all run through one pluggable translation agent, and LM Studio is one of the providers you can point it at: a desktop app that downloads open models and serves them over an OpenAI-compatible API. This guide wires that agent up to a model on your own hardware, so the lookups that would have gone to the cloud are answered locally instead — no API key, no cloud round-trip, no per-token bill.

This post documents the LM Studio provider in the current Lector release. Under the hood Lector talks to LM Studio over the standard OpenAI chat-completions API (/v1/chat/completions and /v1/models), so everything here applies to any server that speaks that protocol. LM Studio is just the easiest one to get running on a laptop.

Why an OpenAI-compatible local server

Lector doesn't hard-code a translation API. It selects a provider and calls it through a small interface (complete() for translations and explanations). For local models, that interface is satisfied by anything that implements the OpenAI shape:

  • POST /v1/chat/completions — the actual translation/definition call
  • GET /v1/models — so Lector can list the models you have available
  • an optional Authorization: Bearer <key> header for protected servers

LM Studio exposes exactly these endpoints. That means your reading history, vocabulary, and the text you're translating never leave your hardware, and you can swap the underlying model whenever you like without touching Lector.

A note on quality. You might expect a local model to fall behind a frontier one on a low-resource language. I benchmarked it, and for Afrikaans that mostly isn't true. Translating Afrikaans into English (Tatoeba, chrF++), Gemma 4 12B (QAT) running in LM Studio scores 82.7, ahead of Claude Opus and GPT-4o and within half a point of GPT-5 and Gemini 2.5 Pro, the only two that beat it. Even the tiny Gemma 4 E4B clears 80. Pick a capable instruction-tuned model, a general one rather than a 4B coder model, and local quality lands closer to the cloud than the cost gap suggests. Treat these as relative gaps on a Tatoeba set, not absolute scores. The one model that struggled was, of all things, Apple Intelligence's on-device model, which errored on about a quarter of the Afrikaans sentences. The docs' LLM Providers section has the trade-offs.

What you'll need

  • A running Lector instance. See the installation guide if you don't have one yet.
  • A machine to run LM Studio on (macOS on Apple Silicon, Windows, or Linux). This can be the same machine as Lector or a different one on your network.
  • Enough memory for your chosen model. A 7-8B model quantised to 4-bit needs roughly 5-6 GB of RAM or VRAM.

Step 1 · Install LM Studio and download a model

  1. Download LM Studio from lmstudio.ai and install it.
  2. Open the Discover (search) tab and download an instruction-tuned model. Good starting points are a compact model like Google's Gemma (the efficient E4B variant runs comfortably on a laptop), Llama 3.1 8B Instruct, or Qwen2.5 7B Instruct. Pick a 4-bit (Q4) quant to keep memory use modest.
  3. Wait for the download to finish. You don't need to chat with it in the LM Studio UI; we only need the server.

Step 2 · Start LM Studio's local server

LM Studio runs the OpenAI-compatible API on port 1234 by default.

From the app

Open the Developer tab (called the Local Server tab in older versions) and start the server. It will begin listening on http://localhost:1234.

Or from the CLI

LM Studio ships a headless CLI, which is handy if the app runs on a server:

# Start the OpenAI-compatible server on :1234
lms server start

# (optional) load a model so the first request isn't slow
lms load <model-name>

Verify it's up

From the same machine, confirm the API answers and reports your models:

curl http://localhost:1234/v1/models

You should get back a JSON list with at least one model id. That's what you'll select in Lector.

Serving to another machine? By default LM Studio binds to localhost only, so nothing outside that machine can reach it, including a Docker container. If Lector runs anywhere other than the same OS as LM Studio, enable "Serve on Local Network" in LM Studio's server settings (this binds it to 0.0.0.0). That's the most common reason the connection fails. See Running Lector in Docker below.

Step 3 · Point Lector at LM Studio

In Lector, open Settings → AI Provider, set the provider to Local / self-hosted (OpenAI-compatible), and choose the LM Studio preset (it autofills the endpoint to localhost:1234). Then:

Lector's AI Provider settings configured for LM Studio: provider set to LM Studio, endpoint pointing at a Tailscale address, the google/gemma-4-e4b model selected, and a green Connected status.
Lector's Settings → AI Provider panel configured for LM Studio. Here the server runs on another machine reached over Tailscale (a 100.x address), with google/gemma-4-e4b selected and a live Connected check.

Endpoint

The base URL of your LM Studio server, without the /v1 suffix. Lector adds that itself.

  • LM Studio on the same OS as Lector: http://localhost:1234
  • LM Studio on a different machine: use that machine's address, e.g. http://192.168.1.50:1234
  • Lector in Docker: see the Docker section. localhost will not work here.

API Key (optional)

Leave this empty for a normal local setup. You only need it if your LM Studio is behind a reverse proxy or you're using LM Studio Cloud. When set, Lector sends it as a Bearer token from the server side, never exposed to the browser after you save it.

Model

Click Fetch models. Lector queries /v1/models and populates the dropdown with what LM Studio has available; pick the one you downloaded. If the fetch fails (e.g. the server isn't reachable yet), you can type the model id in manually.

That's it. Open a book or article, click a word, and the lookup is answered by your local model. The first request after the server starts can be slow while LM Studio loads the model into memory; enable just-in-time / auto-load, or run lms load (from step 2) to pre-load it. If LM Studio is ever unreachable, Lector quietly falls back to its built-in dictionary of the most common words, so reading never breaks.

Running Lector in Docker

This is the part that trips people up, so it gets its own section. Lector's translation call happens server-side, inside the container. So when you type http://localhost:1234, "localhost" means the container itself, not your host machine where LM Studio is running. The container has no LM Studio, so the connection is refused.

There are two things to get right:

1. Make LM Studio reachable off-localhost

As noted above, enable "Serve on Local Network" in LM Studio so it binds to 0.0.0.0:1234 instead of 127.0.0.1:1234. A container can't reach a host's loopback-only service.

2. Use a host-reachable address in the Endpoint

Your setup Endpoint to use
Docker Desktop (macOS / Windows) http://host.docker.internal:1234
Docker on Linux http://host.docker.internal:1234, but you must add the host-gateway mapping (below), or just use the host's LAN IP, e.g. http://192.168.1.50:1234
LM Studio on a separate machine That machine's LAN IP, e.g. http://192.168.1.50:1234

On Linux, host.docker.internal isn't defined by default. Add it to your compose service:

services:
  lector:
    image: ghcr.io/heuwels/lector:latest
    container_name: lector
    restart: unless-stopped
    ports:
      - "3400:3000"
    volumes:
      - ./data:/app/data
    environment:
      - NODE_ENV=production
    # Lets the container resolve host.docker.internal on Linux
    extra_hosts:
      - "host.docker.internal:host-gateway"

Then set the Endpoint to http://host.docker.internal:1234 in Lector's settings.

Configuring it without the UI

If you'd rather not click through settings (handy for reproducible deployments), you can configure the provider entirely with environment variables. The settings you save in the UI take precedence; these env vars are the fallback when a setting isn't set.

Variable Default Description
LLM_PROVIDER anthropic Set to lmstudio to use LM Studio as the translation agent
LMSTUDIO_URL http://localhost:1234 Base URL of the LM Studio server (no /v1 suffix)
LMSTUDIO_MODEL (none) The model id to use (as it appears in /v1/models)
LMSTUDIO_API_KEY (none) Optional. Bearer token, only for auth-protected servers

A complete Docker Compose example, pointing at LM Studio running on the Docker host:

services:
  lector:
    image: ghcr.io/heuwels/lector:latest
    container_name: lector
    restart: unless-stopped
    ports:
      - "3400:3000"
    volumes:
      - ./data:/app/data
    environment:
      - NODE_ENV=production
      - LLM_PROVIDER=lmstudio
      - LMSTUDIO_URL=http://host.docker.internal:1234
      - LMSTUDIO_MODEL=google/gemma-4-e4b
    extra_hosts:
      - "host.docker.internal:host-gateway"

Troubleshooting

Symptom Likely cause & fix
Connection refused / can't reach server LM Studio is bound to localhost, or you're using localhost from inside Docker. Enable "Serve on Local Network" and use host.docker.internal or the host LAN IP.
"Fetch models" returns nothing The server is up but no model is downloaded, or the endpoint is wrong. Confirm with curl <endpoint>/v1/models.
First translation is very slow, then fast The model is loading just-in-time. Click Load in settings to pre-load it, or enable auto-load in LM Studio.
Translations are weak or literal The model is too small for the language. Try a larger / better instruction-tuned model, or use Anthropic or Apfel for difficult languages.

You're not locked in

Because Lector speaks the OpenAI-compatible API, LM Studio is one option among several behind the same provider: Ollama, vLLM, or a remote OpenAI-compatible endpoint all use the same endpoint / model / optional-key flow (Ollama and LM Studio even get presets that autofill the endpoint). The other provider is Anthropic (Claude), for when you want cloud quality. Start local with LM Studio, switch to Claude for a tricky language, switch back — your books and progress don't care which agent is doing the translating.

← Back to the blog