So you followed the instructions, stood up the vCenter Operations (vCOps) Manager vApp, logged in and you can see a pretty dashboard representing your environment. But what does it all mean, how will it help me and how can I quantify what it’s worth to my organisation?
Firstly, if you haven’t done so, you need to login to the vSphere Dashboard. During installation a plug in will have been added to your vSphere client but I much prefer using a browser and preferably Chrome or Firefox which are MUCH quicker then Internet Explorer. Just browse to the IP address of you UI VM and log in using the admin credentials you created on installation.
Note: although you will see information immediately after installation it can take 2 or 3 weeks before vCOps baselines your system and starts to learn normal behaviour. You may want to wait a couple of weeks before you evaluate in anger.
Hopefully once you’ve logged into you will see a dashboard similar to the above. In your environment you will probably see a lot more ‘green’ in your dashboard. My screenshots are from a demo environment where we dropped a few bombs to simulate some system outages!
The Health Badge
The first place to look at is your Health Badge. As I have done in the above picture, open up the hierarchy on the left hand side and navigate down to one of your important datacenters or clusters to bring up the dashboard for that object. In this post I am going to focus on our Palo Alto datacenter.
Firstly click on the ‘Why is Health xx‘ twistie to open up the sub-badges of Workload, Anomalies and Faults.
The Weather Map is showing you the Health of all you individual objects. If you see any Yellow or Red you can double click on the box and it will take you straight to that object (VM, Host or Datastore) and will tell you either immediately what is wrong (is there a constrained resource or a particular fault?) or it will display the most significant metrics for that resource to point you at the reason why the object, or in what way the object is operating abnormally or differently to how it normally operates.
Scroll down and look at your Workload Badge.
Workload is a measure of how much resource is being demanded vs. how much is available. In a large container like a Datacenter which is rolling up all the resources and demand the above is a typical view and yours may look similar.
What it will show you is that your workloads are probably demanding a very small proportion of the resources you have bought and implemented. One of the benefits of vCOps is to provide you the empirical evidence to help you drive up utilisation without impacting the risk of performance or availability degrading.
If your system has been running for a few weeks you will also see the blue brackets in this view. These show what vCOps is EXPECTING the demand to be at this point of time.
So – in the above example, our workload score for this object (the datacenter) is easily within the capabilities of the resources we have provided but is a little higher than we would be expecting at the moment.
The next badge to look at is Anomalies which is below the Workload badge.
vCenter Operations collects about 150 metrics for every VM and 1500 metrics for every host every 5 minutes. For a 1,000 VM estate that is probably in the region of 250,000 metrics every 5 minutes.
The anomalies badge measures how many of those metrics are currently abnormal. The badge and as a result the Health badge will only really degrade if the anomalies count exceeds the calculated Problem Threshold.
As you can see in the above we have just breached the Problem Thresholds so the badge has degraded and an alert may have been generated or may be generated soon.
Finally the Faults badge.
The faults badge lists faults with the object. This would be things like Network link states down, Power supplies down etc)
These three sub badges all roll up to provide the Health badge.
The Environment (Skittle) View
Another way to view the badges is using the Environment or ‘Skittle’ view. This shows, in the above case, the heath badge of all my demo objects. It is often useful to deselect the grey objects and sort by 0 -100. This is probably a great place to look at in your evaluation as it should show you the least healthy objects in your environment. I often like to look at the Workload, Anomalies and Fault skittle views.
The above is a very, very brief overview of just the Dashboard view and what the badges are and mean. If you want to learn more there is a free online course available at http://mylearn.vmware.com/mgrreg/courses.cfm?ui=www_edu&a=det&id_course=171758 – Module 3 covers the badges in more detail.
So what does it all mean and how does it help?
The Health badge and its sub-badges of Workload, Anomalies and Faults will fundamentally highlight problem areas more quickly and you will be able to troubleshoot them more quickly and effectively.
Every environment is different but this paper describes some of the benefits: http://www.vmware.com/files/pdf/vcenter/Management_Insight_Study_Shows_Businesses_Benefits.pdf
As you can see from the chart taken from that paper there are a lot of benefits to vCOps but in this area of the Health part of the dashboard, clients can expect to see a 30% improvement in the downtime of their most critical applications and 16% time reduction in diagnostic and problem resolution.
All those benefits in the chart above can be of tremendous value to many organisations. Sometimes it is difficult to put a monetary value on the above but there are some tools that can help including http://roitco.vmware.com/
One area where you will most easily be able to show the financial benefits will be with Capacity and that will be the subject of Part 2 in this series of understanding the value of vCOps…