Run Private AI Locally with LM Studio

intermediate1,457 reads

Running AI on your own computer — rather than sending your prompts to OpenAI or Anthropic's servers — gives you complete data privacy, eliminates per-message API costs, and allows you to work completely offline. LM Studio is the easiest tool for running large language models locally: it has a graphical interface, a built-in model library, and no coding required. This tutorial walks you through the full setup.

Who this is for: Intermediate AI users who are privacy-conscious, cost-sensitive, or frequently working with sensitive documents — and want a reliable local AI setup without touching the command line.

What you'll build: A fully functional local AI setup capable of answering questions, helping with writing, and optionally serving other apps through a local API endpoint.

Prerequisites

LM Studio downloaded and installed (lmstudio.ai — available for macOS, Windows, Linux)
At least 8GB of RAM (16GB recommended for best performance; see model recommendations below)
Patience for the initial model download (1–5 GB per model, depending on size)

Understanding What "Local AI" Means

When you use ChatGPT or Claude, your prompt (the text you type) is sent over the internet to that company's servers, processed by their model, and the response is sent back. Everything you type is potentially stored and used under their terms of service.

With a local model, the entire process happens on your machine. Your prompts go to a model file sitting on your hard drive, the response is generated by your CPU/GPU, and nothing is sent anywhere. The model has no awareness of previous sessions and no connection to the internet — it's completely self-contained.

What you gain: Privacy, no API costs, offline capability, and no usage limits. What you give up: The most cutting-edge models (GPT-4, Claude 3 Opus) require massive infrastructure that can't run on a personal computer. Local models are capable but generally lag behind the frontier models for complex reasoning tasks.

Step 1: Download and Install LM Studio

Go to lmstudio.ai and download the installer for your operating system:

macOS: Download the .dmg file, open it, drag to Applications
Windows: Download the .exe installer, run it, follow the prompts
Linux: Download the .AppImage, make it executable, run it

First launch may take a moment as LM Studio initializes. You'll see a clean interface with a left sidebar showing tabs for Chat, Discover, Local Server, and Settings.

✅ You're done with Step 1 when: LM Studio is open and you can see the main interface.

Step 2: Choose and Download a Model

Go to the Discover tab (magnifying glass icon in the left sidebar). This is LM Studio's model browser — it shows models from Hugging Face that are compatible and commonly used.

Choosing the right model for your hardware:

Available RAM	Recommended Model	Why
8 GB	Phi-3 Mini (3.8B) or Llama 3.2 3B	Runs comfortably, fast responses
16 GB	Mistral 7B or Llama 3.1 8B	Good balance of quality and speed
32 GB+	Llama 3.1 13B or Mistral 12B	Near-cloud quality for most tasks
Mac with Apple Silicon	Any of the above	Apple's Neural Engine is very efficient

Tip: For most everyday tasks (writing, summarizing, answering questions), a 7B model is excellent. You won't notice a quality difference compared to frontier models for the majority of use cases. The difference becomes apparent only for complex multi-step reasoning or coding tasks.

In the Discover tab, search for your chosen model. Click on it to see the available versions — you'll see options like "Q4_K_M" or "Q5_K_M." These are quantization levels (how much the model has been compressed to reduce file size):

Q4_K_M — Good quality, smaller file, faster loading (recommended for most users)
Q5_K_M — Slightly better quality, larger file
Q8_0 — Near full quality, largest file (use only if you have plenty of RAM)

Click Download on your chosen model and version. The download may take 5–20 minutes depending on your internet speed and the model size. You can see progress in the Downloads tab.

✅ You're done with Step 2 when: The download completes and the model appears in your library.

Step 3: Load the Model and Start a Chat

Go to the Chat tab (speech bubble icon). At the top, you'll see a dropdown labeled "Select a model to load." Click it and choose the model you just downloaded.

LM Studio will load the model into memory — this takes 10–60 seconds depending on your hardware. You'll see a progress indicator. Once loaded, you'll see the model name at the top and a chat input box at the bottom.

Type your first message — just like you would in ChatGPT. Example: "Explain the key differences between supervised and unsupervised machine learning in plain English." Press Enter.

The model generates a response using your computer's processor. First-token latency (time to first word) is usually 1–3 seconds; generation speed varies from 5 to 50+ tokens per second depending on your hardware and model size.

✅ You're done with Step 3 when: You've received your first response from the local model.

Step 4: Enable the Local Server (For Connecting Other Apps)

One of LM Studio's most powerful features is the Local Server — it creates an API endpoint on your computer that other apps can connect to, using the same format as OpenAI's API. This means any app that supports OpenAI can be pointed at your local model instead.

Go to the Local Server tab (plug icon). Click Start Server. By default, the server runs at http://localhost:1234.

Apps you can now connect to your local model:

Open WebUI (self-hosted ChatGPT interface) — set the API base URL to http://localhost:1234
Continue (VS Code AI coding assistant) — add a provider with type "openai" and base URL http://localhost:1234
n8n workflows — use the OpenAI node with a custom base URL
Any application with an "OpenAI API base URL" setting

No API key is needed for local connections — the server accepts requests from localhost without authentication by default. If you expose it on your network (not recommended without auth), add authentication in LM Studio settings.

Step 5: Understanding When to Use Local vs. Cloud

Local AI is the right choice in specific situations — not all of them.

Use local when:

Working with sensitive content — drafts containing personal information, client data, internal business documents, or any content that shouldn't leave your organization
High-volume tasks — if you run hundreds of summarizations or classifications per day, local eliminates API costs entirely
Offline needs — traveling, unreliable internet, or air-gapped environments
Privacy-first workflows — legal documents, medical notes, financial records, personal journals

Use cloud (ChatGPT, Claude) when:

You need the best quality — frontier models significantly outperform local models on complex reasoning, code generation, and nuanced analysis
Multimodal tasks — image understanding, voice, code execution require cloud infrastructure
Zero ops preference — no setup, no hardware requirements, instant access from any device
Bursty usage patterns — occasional heavy use followed by no use makes cloud more cost-effective than maintaining hardware

The hybrid approach (recommended): Use local for private, routine, or high-volume tasks. Use cloud APIs for high-stakes or complex tasks. LM Studio makes it easy to switch — just keep both available.

Troubleshooting

"The model is very slow" → Try a smaller model (lower B count) or a more aggressively quantized version (Q4 instead of Q5). Also close other memory-intensive apps.

"LM Studio says insufficient memory" → Your RAM is too limited for the model. Reduce the context window in LM Studio settings (Chat > Context length) or switch to a smaller model.

"Responses are low quality" → Try a larger model if your RAM allows, or try a different model family (Mistral often performs differently than Llama for the same task).

"The local server isn't working with another app" → Check the API base URL — it should be http://localhost:1234/v1 (note the /v1 at the end, which some apps require). Also verify the server is running (green indicator in the Local Server tab).

Discussion

Loading…

← Back to course