Imagine having the entire VMware Cloud Foundation 9.1 documentation right at your fingertips. We're talking about over 9,000 pages of dense technical content. Finding a specific answer usually means endless searching through PDFs. But what if you could just chat with it?
That’s exactly what I’ve set up. And the best part? It runs entirely locally on a MacBook Pro. No subscription costs, no cloud processing, and absolutely zero data leaving your machine.
What Did We Build?
We built a local RAG (Retrieval-Augmented Generation) pipeline. In simple terms, it's an AI system that understands your questions, instantly retrieves the most relevant sections from the VCF documentation, and delivers a clear, accurate answer. It's like having a VMware expert sitting right next to you.
Here’s the stack:
- Ollama: The engine running AI models locally on your Mac.
- Mistral 7B: The language model that answers your questions.
- AnythingLLM: The user interface for chatting with your documents.
- MacBook Pro M2 Max (64 GB): More than enough power to run this setup smoothly.
Why Not Train a Custom Model?
When people think about combining AI with their own documents, they often jump straight to training a custom model. Let me be direct: that’s unnecessary and, for this use case, completely the wrong approach.
|
Approach |
Cost |
Time |
Results |
|
Training a Model |
€5,000 – €50,000 |
Weeks |
Mediocre for specific docs |
|
RAG with AnythingLLM |
€0 |
One afternoon |
Excellent and highly accurate |
RAG works differently. Instead of baking knowledge into a model's weights, the system retrieves the relevant document sections at query time and passes them to the language model. It's faster, cheaper, and far more accurate for domain-specific documentation like VCF.
Step 1: Install Ollama
Ollama is the engine that runs AI models locally on your Mac. You can install it easily via Homebrew:
brew install ollama
Next, download the Mistral 7B model. It’s a powerful open-source model that performs exceptionally well on technical documentation:
ollama pull mistral
Mistral is about 4.4 GB and runs flawlessly on an M2 Max. If you have less memory, llama3.2 (2 GB) is a solid, faster alternative, though slightly less accurate.
Step 2: Download and Install AnythingLLM
AnythingLLM is a free desktop app that lets you upload documents and chat with them directly.
- Head over to anythingllm.com/download and download the Apple Silicon version.
- Open the DMG file and drag the app to your Applications folder.
Pro Tip: If macOS warns you about an "unidentified developer," simply remove the quarantine flag via Terminal:
xattr -cr /Applications/AnythingLLM.app
Step 3: Configure AnythingLLM
Launch AnythingLLM and follow the setup wizard. Use these settings:
- LM Provider: Ollama
- Ollama Base URL: http://127.0.0.1:11434
- Chat Model: mistral:latest
- Embedding Provider: Native (built into AnythingLLM )
- Vector Database: LanceDB (default)
Troubleshooting: If you encounter the error model 'qwen3-vl:4b-instruct' not found, you'll need to edit the configuration file directly. Open ~/Library/Application Support/anythingllm-desktop/storage/.env and replace the LLM settings with:
LLM_PROVIDER='ollama' OLLAMA_BASE_PATH='http://127.0.0.1:11434' OLLAMA_MODEL_PREF='mistral:latest' OLLAMA_MODEL_TOKEN_LIMIT=4096
Restart the app, and you're good to go.
Step 4: Upload the VMware VCF 9.1 PDF
- Create a new Workspace in AnythingLLM (e.g., "VCF 9.1 Docs" ).
- Click Upload Document.
- Drag your 9,000-page PDF into the upload window.
AnythingLLM will automatically process the PDF: extracting text, splitting it into chunks, calculating embeddings, and storing everything in the vector database. For a document of this size, it takes a few minutes. Once it's done, you're ready to chat.
Step 5: Chat with Your Documentation
Now, you can ask questions just like you're talking to a colleague:
- "What are the minimum hardware requirements for VCF 9.1?"
- "How do I configure NSX in a VCF 9.1 environment?"
- "What's new in VCF 9.1 compared to 9.0?"
The system retrieves the relevant sections and generates an answer, even including references to the source pages it used.
Why This Works So Well on a Mac
Apple Silicon (M1/M2/M3/M4) has a massive advantage: unified memory. The CPU and GPU share the same memory pool, meaning a 7B parameter model fits entirely in RAM on a Mac with 32 GB or more. An M2 Max with 64 GB can even run 13B models locally without breaking a sweat.
|
Mac Model & Memory |
Recommended Model |
|
M1/M2 (16 GB) |
llama3.2:3b (fast, compact) |
|
M1/M2 (32 GB) |
mistral:7b |
|
M2 Max (64 GB) |
mistral:7b or llama3.1:13b |
|
M2 Ultra / M3 Max |
Larger models easily supported |
Privacy & Security
Everything runs 100% locally. Your VMware documentation, your questions, and the answers never leave your MacBook. No cloud, no subscription, no data sharing. This makes it the perfect setup for handling confidential internal documentation.
Conclusion
With Ollama and AnythingLLM, you can build a powerful AI assistant in a single afternoon that effortlessly navigates 9,000 pages of VMware VCF 9.1 documentation. It’s local, free, and completely private.
The technology behind this—RAG—is exactly what large enterprises use for their AI applications. You just get to skip the enterprise price tag.
Next step? Try combining multiple documents—release notes, best practices, and architecture guides—all in one workspace. That's when you truly build a comprehensive VMware knowledge base right on your laptop.

