Eric Sloof

Sunday, June 21. 2026

Running a Local AI on Your Mac with LM Studio

What if you could run your own AI model locally — no subscription, no data leaving your computer, no internet required? With LM Studio, that's exactly what you get. In this post I'll walk you through installing it on a Mac, loading a model, and making your first API call.

What is LM Studio?

LM Studio is a desktop application for macOS, Windows, and Linux that lets you download and run large language models (LLMs) locally. It comes with a built-in chat interface and — importantly for developers — a local API server that is fully compatible with the OpenAI API. That means any app or script already talking to ChatGPT can be pointed at your own machine instead.

Installation

On a Mac with Apple Silicon, installation is straightforward. Download the latest .dmg from the LM Studio website, open it, and drag the app to your Applications folder. The whole process takes about a minute.

Your First Model: Gemma 4

When you open LM Studio for the first time, it asks you to download a model. The default suggestion is Google's Gemma-4-e4b — a solid choice for a first model. It's the effective 4B parameter version of Gemma 4 and supports text, image input, reasoning, and tool calling. The download is about 6.86 GB.

Starting the API Server

Once your model is loaded, head to the Developer tab (the </> icon in the left sidebar) and click Start Server. LM Studio will spin up a local API server at http://localhost:1234.

You can verify it's running by opening a browser and visiting:

http://localhost:1234/v1/models

Or from the terminal:

curl -s http://localhost:1234/v1/models | python3 -m json.tool

You'll get back a JSON list of the models currently loaded — in our case Gemma-4-e4b and a bundled embedding model.

Sending Your First Message via the API

The API is OpenAI-compatible, so the request format will look familiar:

curl -s http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"google/gemma-4-e4b","messages":[{"role":"user","content":"Say hello and introduce yourself briefly."}]}'

curl -s http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"google/gemma-4-e4b","messages":[{"role":"user","content":"Say hello and introduce yourself briefly."}]}'

The model responds almost instantly and — a nice touch — the response includes a reasoning_content field showing the internal thinking process before the final answer was produced. Gemma 4 reasons before it responds, similar to OpenAI's o1 series.

What Can You Build With This?

Because the local server speaks the OpenAI protocol, you can:

Point Python or Node.js scripts at http://localhost:1234/v1 using the standard openai SDK
Connect tools like AnythingLLM to build a private RAG setup over your own documents
Use editor plugins like Continue.dev for local AI-assisted coding
Experiment with different open-source models (Llama, Mistral, Phi, Qwen, and more) without any per-token costs

Python example: streaming chat with LM Studio and OpenAI client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

MODEL = "google/gemma-4-e4b"
history = []

print("Chat gestart. Typ 'exit' of 'quit' om te stoppen.\n")

while True:
    user_input = input("Jij: ").strip()
    if not user_input:
        continue
    if user_input.lower() in ("exit", "quit"):
        print("Tot ziens!")
        break

    history.append({"role": "user", "content": user_input})

    print("Model: ", end="", flush=True)
    response_text = ""

    stream = client.chat.completions.create(
        model=MODEL,
        messages=history,
        stream=True,
    )

    for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end="", flush=True)
            response_text += delta

    print()
    history.append({"role": "assistant", "content": response_text})

Why This Matters

Running AI locally gives you full control: no data leaves your machine, there are no API costs, and you're not dependent on external services. For development, testing, or privacy-sensitive workflows, a local model is a powerful addition to your toolkit.

LM Studio makes the barrier to entry remarkably low. If you have an Apple Silicon Mac with enough RAM, you can go from zero to a running local AI in under 15 minutes.

Saturday, June 13. 2026

The Day the Model Went Dark: Why Your AI Strategy Needs a Sovereignty Layer

Last week, the most capable publicly available AI model on the planet went from "the hard part is finding a task it can't do" to a 404 in under a week. Not because of a bug. Not because of a billing dispute. Because a government decided it should.

For anyone running infrastructure in a regulated, sovereign, or critical-infrastructure context, this is the most important thing to happen in enterprise AI this year - and it has almost nothing to do with the model itself.

What actually happened

On June 9, 2026, Anthropic launched Claude Fable 5, the first publicly available model in its new Mythos class. It was, by the benchmarks, the most capable model anyone could put their hands on. Three days later, on June 12, the US government issued an export-control directive citing national security, and Anthropic disabled both Fable 5 and its restricted twin, Mythos 5, for every customer worldwide within hours.

The stated trigger was a claimed jailbreak. The reported substance of that jailbreak is almost comically modest: prompting the model to read a codebase and identify software flaws — something cybersecurity engineers do every day, and a capability Anthropic itself points out is already present in other publicly available models. Anthropic complied with the order while publicly disagreeing with it. The directive was framed as restricting access by foreign nationals, and because that was impossible to enforce surgically, the only compliant option was to pull the plug globally.

Read that last part again. A model used by hundreds of millions of people was switched off worldwide, with hours of notice, over a narrow concern, by a single government acting unilaterally. Anthropic's other models - Opus 4.8, Sonnet, Haiku - kept running. But the flagship was gone overnight.

The lesson is not "jailbreaks are bad"

It's tempting to file this under AI safety drama and move on. That would be a mistake. The real takeaway for anyone responsible for an architecture is much colder:

A core dependency in your stack can be revoked by a third party who is not your vendor, under a legal regime you are not subject to, for reasons you cannot predict, contest, or plan around.

This is not a hypothetical risk register entry anymore. It is a documented event with a timestamp. And it stacks neatly on top of concerns that were already visible at launch:

Data residency and retention. Fable 5 shipped with mandatory 30-day data retention, which effectively locked out GDPR-bound European organisations. Microsoft had already restricted internal employee access over data-retention concerns before the government even acted.
Jurisdictional reach. The directive didn't care where the customer was. A model hosted in the US, governed by US law, is subject to US policy decisions - full stop - no matter whose data is flowing through it.
Opaque self-limiting behaviour. The system card disclosed that the model could silently reduce its own effectiveness in certain contexts. Whatever you think of the intent, "the tool quietly decides to perform worse and doesn't tell you" is not a property you want in production infrastructure.

None of these are unique to one vendor. They are structural properties of consuming frontier AI as a foreign-controlled cloud service.

What this means if you sit where I sit

If you run infrastructure for defence, law enforcement, government, healthcare, finance, or any critical-infrastructure operator, you already think in terms of availability, jurisdiction, and supply-chain risk. Apply those exact lenses to AI and the conclusion writes itself.

You would never accept a storage platform that a foreign government could remotely disable within hours. You would never sign off on a network function whose vendor admits it might silently degrade itself. You would never put data that can't leave the jurisdiction into a service that mandates 30-day offshore retention. Yet a lot of organisations have quietly let exactly these properties into their stack, because the capability was irresistible and the abstraction was convenient.

The Fable 5 episode is the moment to stop treating cloud-hosted frontier AI as plumbing and start treating it as what it is: a strategically important dependency with a foreign-controlled off switch.

The case for on-premise, private AI

The mitigation is not new and it is not exotic. It's the same answer the sovereignty conversation has been circling for years: run the model where you control it.

On-premise or sovereign-cloud private AI — ideally built on open-weight models — changes the risk profile in concrete ways:

No remote kill switch. Once the weights are on your hardware, no vendor and no government can reach in and disable them. Availability becomes your operational responsibility, not someone else's policy decision.
Data never leaves. For darksite, air-gapped, and classified environments, this isn't a nice-to-have; it's the only acceptable model. The data residency question disappears because there is no residency to argue about.
Predictability. The model you validated and certified is the model you keep running. It doesn't get pulled, silently re-routed to a weaker fallback, or quietly told to limit itself.
Auditability. Open weights mean you can inspect, fine-tune, and reason about behaviour rather than trusting a black box governed by a 120,000-character system prompt you'll never see in full.

Open-source and open-weight models — the Llama, Mistral, Qwen, and DeepSeek lineages, among others - have closed enough of the capability gap that "good enough for the task, and fully under our control" is now a defensible position for a large share of real enterprise work. For code assistance, document analysis, retrieval-augmented knowledge work, and most of what actually drives value in an organisation, you do not need the single most capable frontier model. You need a capable model that is still there tomorrow morning.

Let's be honest about the trade-offs

I'm arguing for sovereignty, not for fantasy. Bringing AI on-premise moves the burden onto you, and that burden is real:

You own the capability gap. The best open-weight models trail the frontier, especially on the hardest long-horizon tasks. For most enterprise workloads that's acceptable; for a few it isn't. Know which bucket you're in.
You own the safety and security layer. The vendor's classifiers, guardrails, and abuse monitoring are now your problem. Self-hosting a powerful model without serious thought about misuse, prompt injection, and access control is its own risk.
You own the operations. GPUs, capacity planning, model lifecycle, patching, evaluation pipelines — this is infrastructure work, and it isn't free. The cloud convenience you're giving up was paying for something.
Hybrid is usually the honest answer. For most organisations the realistic posture is a sovereign core for anything sensitive or availability-critical, with optional, clearly-bounded use of frontier cloud models for the narrow cases that genuinely need them — and a contingency plan for the day one of them goes dark.

The bottom line

Fable 5 didn't fail. It was withdrawn — competently, legally, and almost instantly — by an actor outside the customer relationship. That is the part worth sitting with.

If your AI strategy assumes continuous, uninterrupted access to a specific foreign-hosted model, you don't have a strategy; you have a single point of failure with excellent marketing. The fix isn't to abandon frontier AI. It's to make sure the part of your stack you actually depend on is the part you control.

Build the sovereignty layer now, while it's an architecture decision - not later, when it's an incident report.

VCF 9.1 Feature Comparison and Upgrade Paths

Broadcom has released the updated Feature Comparison & Upgrade Paths white paper for VMware Cloud Foundation 9.1 and VMware vSphere Foundation 9.1. This document is the authoritative reference for understanding the differences between VVF, VCF Edge, and full VCF — and it reveals some significant additions worth paying attention to.

Three product tiers, one portfolio

The VMware by Broadcom portfolio is now structured around three primary offerings:

VMware vSphere Foundation (VVF) serves as the entry-level enterprise virtualization platform, combining vSphere, VKS, VCF Operations, and vSAN in a single SKU. It is the logical replacement for vSphere Enterprise Plus, vSphere Enterprise, vSphere for Desktop, and vSphere Scale-Out.

VMware Cloud Foundation Edge is an optimized VCF configuration tailored specifically for edge deployments, supporting single-host all the way up to multi-host clusters. This tier targets environments where full VCF management overhead is impractical, such as remote sites, operational technology environments, and disconnected locations.

VMware Cloud Foundation (full) is the complete private cloud platform, combining all VVF capabilities with NSX, VCF Automation, fleet management, and Private AI Services. All previous VCF tiers (Starter, Standard, Advanced, Enterprise) and vCloud Suite customers are directed here.

What's notably new in 9.1

Confidential Computing is now listed as a feature exclusive to VCF Edge and full VCF, not available in VVF. This is significant for organisations with strict data classification requirements — hardware-enforced memory isolation for sensitive workloads is increasingly relevant in regulated sectors.

DPU and Dual DPU Support also appears exclusively in VCF Edge and VCF, reflecting Broadcom's push toward infrastructure offload using Data Processing Units such as NVIDIA BlueField. This enables compute, networking, and security functions to be offloaded from the host CPU, directly relevant for high-density AI inference environments.

Private AI Services in 9.1 is now a proper platform capability within VCF Automation, including a Catalog Setup Wizard, GPU-capable Deep Learning VM provisioning, Data Services Manager integration for RAG workloads, Vector Databases, AI Agent Builder, and Model Context Protocol (MCP) support. This is exclusively available in VCF Edge and full VCF.

vSAN Cyber Recovery Clusters appear for the first time as a distinct feature in the comparison matrix, available in VCF Edge and VCF (requires Advanced Cyber Compliance add-on). Combined with the existing VMware Live Cyber Recovery integration, this positions VCF 9.1 as a platform with integrated ransomware recovery capabilities.

Security posture: what requires add-ons

One aspect worth calling out explicitly: Compliance Management with remediation — including regulatory compliance baselines such as PCI, custom compliance templates, and vSphere hardening — now requires the Advanced Cyber Compliance (ACC) add-on for both VCF Edge and full VCF. This is a change from earlier versions where some of these capabilities were bundled. Organisations in regulated industries should factor this into their licensing discussions.

Similarly, Configuration Drift Management is listed as deprecated and moved to the ACC add-on. For security-conscious environments, this is a procurement consideration that deserves attention during contract renewal.

Networking highlights

The networking feature delta between VVF and VCF remains substantial. VVF gets vSphere Distributed Switch with L2 capabilities, but everything above that — dynamic routing (OSPFv2/BGP/BFD), VRF, EVPN, NAT, L2/L3 VPN, Virtual Private Clouds, NSX Federation, live traffic analysis, and Traceflow — requires at minimum VCF Edge. For organisations running NSX for micro-segmentation and zero-trust networking, this reinforces VCF as the only viable path.

VCF Operations for Networks (formerly vRNI) is exclusively available in VCF Edge and full VCF, providing end-to-end network visibility across virtual and physical underlay, including FIPS 140-2 compliance for the network operations platform. Physical device integration covers Cisco, Arista, Juniper, and Infoblox.

Upgrade paths clarified

The white paper includes clear upgrade path diagrams that resolve a question many customers have been asking since the Broadcom acquisition. The short version:

vSphere Enterprise Plus, Enterprise, Desktop, and Scale-Out → vSphere Foundation
Any combination of vSphere + vSAN + NSX + Aria → VMware Cloud Foundation
vCloud Suite Enterprise/Advanced → VMware Cloud Foundation
vCloud Suite Standard → vSphere Foundation
All previous VCF tiers → VMware Cloud Foundation (with Firewall add-on)

Notably, customers currently using vSphere + vSAN without NSX or Aria are given a choice: full VCF or vSphere Foundation with a vSAN add-on. The latter is the lower-cost option for environments that do not require automation, NSX, or AI capabilities.

Practical takeaway for infrastructure architects

VCF 9.1 continues the trajectory Broadcom set after the acquisition: consolidate the product line, make VCF the premium destination, and bundle AI infrastructure capabilities deeply into the platform. For organisations evaluating their 2026-2027 licensing strategy, the key decision point remains whether NSX and VCF Automation justify the cost delta over VVF. For most enterprise environments — especially those with multi-site deployments, security requirements, or AI ambitions — the answer is yes.

For edge and disconnected environments, VCF Edge in 9.1 deserves a serious look. The combination of single-host cluster support, GitOps-driven desired state management, air-gapped operation, and Private AI inference support makes it a compelling platform for operational technology and mission-critical remote deployments.

The full white paper is available here.

Sunday, June 7. 2026

Kong on vSphere Kubernetes Service – What does this white paper cover?

Broadcom published a technical white paper in February 2026 covering the integration of Kong API Gateway with vSphere Kubernetes Service (VKS) within VMware Cloud Foundation. The paper describes a reference architecture for organizations looking to combine Kubernetes workloads with enterprise-grade API governance.

Why Kong on VKS?

As Kubernetes environments scale, the challenge shifts from raw throughput to governance: how do you ensure consistent security, predictable latency, and auditable traffic management across all your microservices? Kong acts as the intelligent "front door" of the VKS cluster, filling exactly the gap that standard ingress controllers leave behind.

Two deployment models

The white paper fully details two architectures, complete with all accompanying YAML and Helm commands:

On-premises – Both the Control Plane and Data Plane run locally within the VKS cluster. This model is particularly suited for high-compliance environments where the management layer must remain on-premises — a familiar requirement in public sector environments.
Hybrid (Kong Konnect) – The Control Plane is a SaaS service, while the Data Plane remains local. Operational management is centralized in the cloud, but all API traffic is processed within your own infrastructure boundary.

Technical environment

The validation was performed on VKS version 3.5, Kubernetes v1.34.2, Ubuntu 24.04, and vSAN ESA with RAID-5 as the storage policy. The Kong Operator (v1.0.2) was installed via Helm, supplemented by cert-manager for automated mTLS certificate rotation between the Control Plane and Data Plane.

Relevance for VCF infrastructure design

For infrastructure architects working on VCF designs, this paper is particularly interesting because it demonstrates how Kubernetes-native tooling — Gateway API, Cluster API, GitOps via Argo CD — integrates seamlessly with existing vSphere workflows. NSX handles network micro-segmentation, the vSphere CSI driver transparently exposes vSAN storage policies as Kubernetes storage classes, and the entire stack is declaratively manageable — including certificate lifecycle and routing policies as code.

Source: VMware by Broadcom Technical White Paper – Kong on vSphere Kubernetes Service

(Page 1 of 1, totaling 4 entries)

Entries from June 2026