Running Local LLMs with Cursor: A Complete Setup Guide
If you are working on proprietary code, handling sensitive data, or just do not want your snippets leaving your machine, running a local LLM with Cursor is a solid option. This guide walks through the practical setup for Ollama and LM Studio, plus the trade-offs you should know before switching.
Why Bother with Local Models?
Three reasons come up again and again in the community:
- Privacy: Your code never leaves your local network. No third-party API, no data retention policy to parse.
- Cost: After hardware costs, inference is free. No per-token billing, no usage spikes.
- Offline access: Works on planes, in locked-down corporate networks, or anywhere without internet.
Local models shine for boilerplate generation, simple refactors, and quick questions about your own codebase. You do not need GPT-4 for everything.
Supported Local Model Backends
Cursor does not ship with built-in local model support in the same way it supports OpenAI or Anthropic APIs. Instead, you point Cursor at a local server that exposes an OpenAI-compatible API. The three most common options:
| Backend | Best For | Setup Complexity |
|---|---|---|
| Ollama | Quick start, model management | Low |
| LM Studio | GUI lovers, Windows/Mac users | Low |
| llama.cpp | Maximum control, minimal overhead | Medium |
This guide focuses on Ollama and LM Studio because they are what most developers actually use day-to-day.
Ollama + Cursor: Step-by-Step
1. Install Ollama
Download from ollama.com and install. It runs as a background service on macOS, Linux, and Windows.
Verify it works:
ollama --version
2. Pull a Model
Start with a code-capable model. The community favorites are:
codellama:7b-codeorcodellama:13b-code— fast, decent for simple tasksdeepseek-coder:6.7b— strong on code completionqwen2.5-coder:7bor14b— good balance of speed and quality
ollama pull deepseek-coder:6.7b
3. Start the OpenAI-Compatible Server
Ollama exposes an OpenAI-compatible API on localhost:11434. Keep it running:
ollama serve
Or let the background service handle it.
4. Configure Cursor
Open Cursor Settings (Ctrl/Cmd + ,) and navigate to:
Settings > Models > OpenAI API Key
Set the base URL to:
http://localhost:11434/v1
Leave the API key field blank or enter any dummy string (some versions require a non-empty value).
Select the model name that matches what you pulled. For example:
deepseek-coder:6.7b
Cursor sends requests in the OpenAI chat completions format. Ollama's /v1 endpoint translates these automatically. You do not need a proxy.
5. Test It
Open a file and hit Ctrl/Cmd + L to open the chat panel. Ask something simple:
Write a Python function that reverses a string without using slicing.
If you get a response, you are connected. If it hangs, check that ollama serve is running and the model name matches exactly.
LM Studio + Cursor: Step-by-Step
LM Studio is the better choice if you want a GUI for downloading and switching models.
1. Install LM Studio
Download from lmstudio.ai. Available for macOS, Windows, and Linux.
2. Download a Model
Open LM Studio, go to the Discover tab, and search for a code model. Good picks:
TheBloke/CodeLlama-7B-Instruct-GGUFTheBloke/DeepSeek-Coder-6.7B-Instruct-GGUFQwen/Qwen2.5-Coder-7B-Instruct-GGUF
Download the Q4_K_M or Q5_K_M quantization for a balance of size and quality.
3. Start the Local Server
In LM Studio, go to the Local Server tab on the left. Load your model, then click Start Server.
By default, it runs on:
http://localhost:1234/v1
4. Configure Cursor
Same process as Ollama. In Cursor Settings > Models > OpenAI API Key, set:
http://localhost:1234/v1
The model name field can be left as local-model or whatever placeholder LM Studio expects. LM Studio ignores the model name and uses whatever is currently loaded.
5. Verify
Run the same test prompt. LM Studio's server logs show incoming requests, which is useful for debugging.
What Works and What Does Not
Local models are not a drop-in replacement for Claude 3.5 Sonnet or GPT-4o. Here is the honest breakdown:
| Task | Local 7B-13B | Cloud (Claude/GPT-4) |
|---|---|---|
| Simple refactors | Good | Excellent |
| Boilerplate generation | Good | Excellent |
| Complex architecture decisions | Weak | Excellent |
| Understanding large codebases | Weak | Excellent |
| Multi-file edits | Weak | Good |
| Speed (with GPU) | Fast | Network-dependent |
| Speed (CPU only) | Slow | Network-dependent |
Running a 13B model on CPU can take 10-30 seconds per response. A modern GPU (RTX 3060 or better) brings this down to 1-3 seconds. Manage your expectations.
Hybrid Strategy: The Practical Approach
Most developers who stick with local models use a hybrid workflow rather than going all-in:
- Local model for quick, safe tasks: lint fixes, renaming, simple regex, explaining a function.
- Cloud model for heavy lifting: designing new features, debugging tricky issues, cross-file refactoring.
- Switch based on project: open source or non-sensitive code → cloud; proprietary or regulated code → local.
Cursor makes this easy because you can change the model in settings without restarting the IDE. Some users keep two Cursor windows open — one pointed at local, one at cloud — though that is more of a workaround than a feature.
If you have a Mac with Apple Silicon, Ollama leverages the Neural Engine well. A MacBook Pro M3 Pro can run a 13B model at usable speeds without draining the battery like a discrete GPU would.
Troubleshooting
"Connection refused" errors
- Check that the server is running (
ollama serveor LM Studio server tab). - Verify the port: Ollama uses 11434, LM Studio uses 1234.
- Check your firewall or corporate proxy.
Slow responses
- Use a smaller model or a higher quantization (Q4 instead of Q5).
- Ensure your GPU is being used. Ollama logs show
GPUorCPUon load. - Close other GPU-heavy apps.
Nonsensical outputs
- The model name might not match. Ollama is picky about exact names.
- Some models need a specific prompt format. Instruct models work better than base models for chat.
Cursor ignores the local setting
- Make sure you are overriding the OpenAI base URL, not just adding a custom model.
- Restart Cursor after changing the base URL.
Final Thoughts
Local LLMs with Cursor are viable today for a subset of tasks. They are not as capable as cloud models, but for privacy-conscious developers or those working in restricted environments, they are often good enough. Start with Ollama if you want speed of setup, or LM Studio if you prefer a GUI. Expect to iterate on model choice and workflow before you find what works for your projects.