Real-world troubleshooting techniques are showcased, emphasizing the importance of identifying the initial root cause by using SDDC Manager APIs and reference tokens to bypass limited interface details.
Tuesday, April 21. 2026
Bypassing UI Noise - API Driven Root Cause Analysis in VCF 9
All VCF 9 VCAP Disciplines Secured - VCDX Loading…
Last week I successfully passed the VCAP Operations and VCAP VKS exams. With that, my full set of VCF 9 discipline-aligned VCAP certifications is - for now - complete. All five core disciplines are covered: Operations, Automation, Networking, Storage, and VKS. This feels like the right moment to reflect on what this journey means, and to look ahead at where it leads.
The five disciplines
The 2025 VCAP expansion was the largest in VMware's history: eleven new exams across both role-based and domain-specific tracks. For VCF 9, the five discipline certifications each validate deep, hands-on expertise in a specific technical domain within the platform.
- Storage
- Networking
- Automation
- Operations
- VKS
The VKS certification deserves a special mention. Within the new VCF certification framework, the Kubernetes-focused VCAP is a mandatory component on the path to VCDX. This reflects the growing role of modern application platforms - The Kubernetes Supervisor - as an integral part of VMware Cloud Foundation, not an optional add-on.
A fundamentally changed certification landscape
Broadcom has completely overhauled the VMware certification program. Where it once revolved around product-specific knowledge, the new framework is explicitly role- and platform-oriented. Instead of validating isolated component expertise, it now asks professionals to demonstrate understanding of VMware Cloud Foundation as an integrated platform - compute, storage, networking, automation, and observability working as one cohesive system.
This shift is most visible at the VCAP level. The discipline-based VCAPs validate deep technical expertise per domain. The role-based VCAPs - Administrator, Architect, and Support - validate how that knowledge is applied in practice, depending on your day-to-day responsibilities. Together, they form the foundation for the highest level in the program: the VMware Certified Distinguished Expert, or VCDX.
The path from VCAP to VCDX
The formal path to VCDX starts with a role-based VCP as the foundation. From there, candidates complete a set of advanced VCAP certifications - including at least one role-based VCAP and the VKS track. The final step is a live defense, a practical session where architectural insight, problem-solving capability, and real operational experience take center stage. This is not about ticking boxes. It is about proving that you can operate at the highest technical level in real-world VCF environments.
Next milestone: the role-based VCAPs
With all five discipline tracks completed, the focus now shifts to the three role-based VCAPs: Administrator (3V0-11.26), Architect (3V0-12.26), and Support (3V0-13.26). Broadcom has recently announced these as coming very soon - each following the same format of 60 questions over 135 minutes, with a passing score of 300 on a scaled system.
What makes these exams interesting is that they evaluate the same technical foundations from entirely different angles. The Administrator exam focuses on operational execution - lifecycle management, automation workflows, day-2 operations. The Architect exam focuses on design decisions - workload domain strategies, resilience patterns, capacity planning. The Support exam focuses on diagnostic methodology - root cause analysis, log interpretation, structured troubleshooting across the full VCF stack.
For anyone working hands-on with VCF environments, these three tracks represent a logical and valuable next step. They validate not just what you know, but how you apply it.
Why this matters
VCF adoption is accelerating - in sovereign cloud initiatives, in regulated industries, and in large enterprise environments looking for the combination of public cloud automation with full infrastructure control. Organizations need verifiable, role-specific expertise. The new VCAP structure delivers exactly that assurance.
For me personally, this is about more than credentials. Each exam builds the technical foundation needed to contribute meaningfully to the community, to deliver sharp troubleshooting sessions at events like VMUG Connect Amsterdam and Minneapolis, and ultimately to approach the VCDX defense with genuine depth across every relevant domain of the platform.
Next step: VCAP Administrator · Architect · Support → VCDX. Can't wait.
Sunday, April 19. 2026
VMware Cloud Foundation Troubleshooting: Real-World Scenarios and Solutions
After delivering this session at both VMUG Connect Amsterdam and VMUG Connect Minneapolis, I can honestly say the response exceeded my expectations. The rooms were packed, the questions were sharp, and the feedback was overwhelmingly positive. It's clear that VCF troubleshooting is a topic that resonates deeply with the community right now.
What the session was all about
The goal was simple: move beyond theory and give attendees practical, real-world troubleshooting techniques they could take back to their environments the very next day. The session was built around four core areas.
Preventive Health Checks
Before you can troubleshoot, you need visibility. I walked through three essential tools for keeping your VCF environment healthy. The VCF Diagnostic Tool, a Python-based script that runs locally on your vCenter or SDDC Manager appliance and generates reports in TXT, LOG and JSON format. The VMware Health and Security Toolkit (HST), which gives you a comprehensive security assessment across your entire vSphere environment. And the VCF Build and Integrate Lifecycle Toolkit (BILT), which handles both greenfield validation and brownfield readiness checks before expansions or upgrades.
For post-deployment health monitoring, the SoS Utility remains one of the most powerful tools in the VCF admin's arsenal. Running individual checks like certificate health, connectivity health, NTP synchronization and services health gives you a clear picture of your environment's status at any point in time.
A Real-World Troubleshooting Example
This was the part of the session that generated the most discussion. I walked through an actual customer case — a shoutout to Alexander Bituev from ING Frankfurt for this one — where a workload domain creation failed with a frustratingly vague error in the Fleet Management UI.
The key lesson here was what I call the Golden Rule of VCF troubleshooting: the first error in the log is the root cause. Everything that follows is just noise. Using a simple grep command against the domainmanager.log file, or alternatively drilling into the SDDC Manager API via the built-in Developer Center Swagger UI, reveals the full error chain. In this case the nested error told the complete story: the ESXi host already had vSAN enabled, which blocked the workload domain deployment. The referenceToken shown in the Fleet Management UI is your bridge between the UI and the API — always use it.
New Troubleshooting Features in VCF Operations
VCF Operations has matured significantly and now offers a genuinely powerful set of troubleshooting capabilities. The Diagnostic Findings feature collects data from all VCF platform components every four hours and surfaces both active and historical findings. The Findings Catalog currently contains over 637 checks covering components like vCenter, ESXi, vSAN, NSX and VCF Automation.
The Troubleshooting Workbench is a personal favorite. It correlates events, property changes and anomalous metrics across your entire VCF stack in a single view, making it much faster to identify what changed and when. Combined with the Log Compare feature in VCF Operations for Logs 9.0, which lets you run multiple log queries side by side across different time periods, you have an incredibly powerful toolkit for root cause analysis.
The Storage Operations dashboard provides vSAN cluster health scores from 0 to 100, performance KPIs including IOPS, throughput and latency, and links directly to Broadcom KB articles for remediation. The Network Operations view gives you a holistic topology of your entire NSX environment including the new NSX VPC Dashboard with dropped packet monitoring.
Training and Certification
I also covered the VCF troubleshooting training landscape, which has been significantly expanded. The path runs from the foundational VCF Fundamentals for Technical Support course through the professional-level VCF Troubleshooting course with live lab options, all the way up to the advanced domain-specific playlists that lead to VCAP certifications in Storage, Automation, Operations, VKS and Networking. For those with their sights set even higher, the VCDX Distinguished Expert certification remains the ultimate destination.
The Community Response
Both in Amsterdam and Minneapolis the session sparked great conversations. Whether it was the Golden Rule for log analysis, the referenceToken trick for bridging the Fleet Management UI with the Developer Center API, or the new capabilities in VCF Operations, attendees left with concrete techniques they could apply immediately. That's exactly what VMUG sessions should deliver.
If you missed the session, the slides are available for download, and on SlideShare.
Continue reading "VMware Cloud Foundation Troubleshooting:..." »