What if you could run your own AI model locally — no subscription, no data leaving your computer, no internet required? With LM Studio, that's exactly what you get. In this post I'll walk you through installing it on a Mac, loading a model, and making your first API call.
What is LM Studio?
LM Studio is a desktop application for macOS, Windows, and Linux that lets you download and run large language models (LLMs) locally. It comes with a built-in chat interface and — importantly for developers — a local API server that is fully compatible with the OpenAI API. That means any app or script already talking to ChatGPT can be pointed at your own machine instead.
Installation
On a Mac with Apple Silicon, installation is straightforward. Download the latest .dmg from the LM Studio website, open it, and drag the app to your Applications folder. The whole process takes about a minute.
Your First Model: Gemma 4
When you open LM Studio for the first time, it asks you to download a model. The default suggestion is Google's Gemma-4-e4b — a solid choice for a first model. It's the effective 4B parameter version of Gemma 4 and supports text, image input, reasoning, and tool calling. The download is about 6.86 GB.
Starting the API Server
Once your model is loaded, head to the Developer tab (the </> icon in the left sidebar) and click Start Server. LM Studio will spin up a local API server at http://localhost:1234.
You can verify it's running by opening a browser and visiting:
http://localhost:1234/v1/models
Or from the terminal:
curl -s http://localhost:1234/v1/models | python3 -m json.tool
You'll get back a JSON list of the models currently loaded — in our case Gemma-4-e4b and a bundled embedding model.
Sending Your First Message via the API
The API is OpenAI-compatible, so the request format will look familiar:
curl -s http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"google/gemma-4-e4b","messages":[{"role":"user","content":"Say hello and introduce yourself briefly."}]}'
curl -s http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"google/gemma-4-e4b","messages":[{"role":"user","content":"Say hello and introduce yourself briefly."}]}'
The model responds almost instantly and — a nice touch — the response includes a reasoning_content field showing the internal thinking process before the final answer was produced. Gemma 4 reasons before it responds, similar to OpenAI's o1 series.
What Can You Build With This?
Because the local server speaks the OpenAI protocol, you can:
- Point Python or Node.js scripts at
http://localhost:1234/v1using the standardopenaiSDK - Connect tools like AnythingLLM to build a private RAG setup over your own documents
- Use editor plugins like Continue.dev for local AI-assisted coding
- Experiment with different open-source models (Llama, Mistral, Phi, Qwen, and more) without any per-token costs
Why This Matters
Running AI locally gives you full control: no data leaves your machine, there are no API costs, and you're not dependent on external services. For development, testing, or privacy-sensitive workflows, a local model is a powerful addition to your toolkit.
LM Studio makes the barrier to entry remarkably low. If you have an Apple Silicon Mac with enough RAM, you can go from zero to a running local AI in under 15 minutes.


Broadcom published a technical white paper in February 2026 covering the integration of Kong API Gateway with vSphere Kubernetes Service (VKS) within VMware Cloud Foundation. The paper describes a reference architecture for organizations looking to combine Kubernetes workloads with enterprise-grade API governance.