Skip to main content

Running Local LLMs with Cursor: A Complete Setup Guide

If you are working on proprietary code, handling sensitive data, or just do not want your snippets leaving your machine, running a local LLM with Cursor is a solid option. This guide walks through the practical setup for Ollama and LM Studio, plus the trade-offs you should know before switching.

Why Bother with Local Models?

Three reasons come up again and again in the community:

  • Privacy: Your code never leaves your local network. No third-party API, no data retention policy to parse.
  • Cost: After hardware costs, inference is free. No per-token billing, no usage spikes.
  • Offline access: Works on planes, in locked-down corporate networks, or anywhere without internet.
tip

Local models shine for boilerplate generation, simple refactors, and quick questions about your own codebase. You do not need GPT-4 for everything.

Supported Local Model Backends

Cursor does not ship with built-in local model support in the same way it supports OpenAI or Anthropic APIs. Instead, you point Cursor at a local server that exposes an OpenAI-compatible API. The three most common options:

BackendBest ForSetup Complexity
OllamaQuick start, model managementLow
LM StudioGUI lovers, Windows/Mac usersLow
llama.cppMaximum control, minimal overheadMedium

This guide focuses on Ollama and LM Studio because they are what most developers actually use day-to-day.

Ollama + Cursor: Step-by-Step

1. Install Ollama

Download from ollama.com and install. It runs as a background service on macOS, Linux, and Windows.

Verify it works:

ollama --version

2. Pull a Model

Start with a code-capable model. The community favorites are:

  • codellama:7b-code or codellama:13b-code — fast, decent for simple tasks
  • deepseek-coder:6.7b — strong on code completion
  • qwen2.5-coder:7b or 14b — good balance of speed and quality
ollama pull deepseek-coder:6.7b

3. Start the OpenAI-Compatible Server

Ollama exposes an OpenAI-compatible API on localhost:11434. Keep it running:

ollama serve

Or let the background service handle it.

4. Configure Cursor

Open Cursor Settings (Ctrl/Cmd + ,) and navigate to:

Settings > Models > OpenAI API Key

Set the base URL to:

http://localhost:11434/v1

Leave the API key field blank or enter any dummy string (some versions require a non-empty value).

Select the model name that matches what you pulled. For example:

deepseek-coder:6.7b
info

Cursor sends requests in the OpenAI chat completions format. Ollama's /v1 endpoint translates these automatically. You do not need a proxy.

5. Test It

Open a file and hit Ctrl/Cmd + L to open the chat panel. Ask something simple:

Write a Python function that reverses a string without using slicing.

If you get a response, you are connected. If it hangs, check that ollama serve is running and the model name matches exactly.

LM Studio + Cursor: Step-by-Step

LM Studio is the better choice if you want a GUI for downloading and switching models.

1. Install LM Studio

Download from lmstudio.ai. Available for macOS, Windows, and Linux.

2. Download a Model

Open LM Studio, go to the Discover tab, and search for a code model. Good picks:

  • TheBloke/CodeLlama-7B-Instruct-GGUF
  • TheBloke/DeepSeek-Coder-6.7B-Instruct-GGUF
  • Qwen/Qwen2.5-Coder-7B-Instruct-GGUF

Download the Q4_K_M or Q5_K_M quantization for a balance of size and quality.

3. Start the Local Server

In LM Studio, go to the Local Server tab on the left. Load your model, then click Start Server.

By default, it runs on:

http://localhost:1234/v1

4. Configure Cursor

Same process as Ollama. In Cursor Settings > Models > OpenAI API Key, set:

http://localhost:1234/v1

The model name field can be left as local-model or whatever placeholder LM Studio expects. LM Studio ignores the model name and uses whatever is currently loaded.

5. Verify

Run the same test prompt. LM Studio's server logs show incoming requests, which is useful for debugging.

What Works and What Does Not

Local models are not a drop-in replacement for Claude 3.5 Sonnet or GPT-4o. Here is the honest breakdown:

TaskLocal 7B-13BCloud (Claude/GPT-4)
Simple refactorsGoodExcellent
Boilerplate generationGoodExcellent
Complex architecture decisionsWeakExcellent
Understanding large codebasesWeakExcellent
Multi-file editsWeakGood
Speed (with GPU)FastNetwork-dependent
Speed (CPU only)SlowNetwork-dependent
warning

Running a 13B model on CPU can take 10-30 seconds per response. A modern GPU (RTX 3060 or better) brings this down to 1-3 seconds. Manage your expectations.

Hybrid Strategy: The Practical Approach

Most developers who stick with local models use a hybrid workflow rather than going all-in:

  1. Local model for quick, safe tasks: lint fixes, renaming, simple regex, explaining a function.
  2. Cloud model for heavy lifting: designing new features, debugging tricky issues, cross-file refactoring.
  3. Switch based on project: open source or non-sensitive code → cloud; proprietary or regulated code → local.

Cursor makes this easy because you can change the model in settings without restarting the IDE. Some users keep two Cursor windows open — one pointed at local, one at cloud — though that is more of a workaround than a feature.

tip

If you have a Mac with Apple Silicon, Ollama leverages the Neural Engine well. A MacBook Pro M3 Pro can run a 13B model at usable speeds without draining the battery like a discrete GPU would.

Troubleshooting

"Connection refused" errors

  • Check that the server is running (ollama serve or LM Studio server tab).
  • Verify the port: Ollama uses 11434, LM Studio uses 1234.
  • Check your firewall or corporate proxy.

Slow responses

  • Use a smaller model or a higher quantization (Q4 instead of Q5).
  • Ensure your GPU is being used. Ollama logs show GPU or CPU on load.
  • Close other GPU-heavy apps.

Nonsensical outputs

  • The model name might not match. Ollama is picky about exact names.
  • Some models need a specific prompt format. Instruct models work better than base models for chat.

Cursor ignores the local setting

  • Make sure you are overriding the OpenAI base URL, not just adding a custom model.
  • Restart Cursor after changing the base URL.

Final Thoughts

Local LLMs with Cursor are viable today for a subset of tasks. They are not as capable as cloud models, but for privacy-conscious developers or those working in restricted environments, they are often good enough. Start with Ollama if you want speed of setup, or LM Studio if you prefer a GUI. Expect to iterate on model choice and workflow before you find what works for your projects.