by Robert Wenig, Founder and CTO —
What happens when VMware becomes the hardware platform?
VMware is a software virtualization platform, so why am I talking about hardware?
Allow me to prove my sanity and explain what’s going on.
We’re all familiar with concept of "thin provisioning," otherwise known as "lying" or more commonly the "airline seat model." Basically, an airline will sell 120 tickets for 110 seats. They expect some percentage of people not to show up or misconnect — and they still want to fully utilize the physical resources, i.e. the seats.
In the airline model, if 120 people show up, 10 people get bought off with free tickets and a later flight.
And so now we have thin provisioning in the computer world. Thin Memory (Virtual Memory), Thin Disk (San disk allocation), Thin CPU (Compute Resource Allocation) and Thin Network (Network Resource Allocation).
Last year, Cisco, EMC and VMware banded together to create a reference platform of Thin Cubed (CPU, Storage, Network) called vBlock.
Gone are the days of referring to the HP computer or Dell or IBM. Parts are parts — you’re just using Thin Cubed Resources, just like electricity. You don’t care (unless you’re a California Green nut like me) if your electricity comes from natural gas, coal, solar or wind, you care only that when you flick the switch, the light comes on.
So, what are some of the benefits of such an arrangement:
- All Hardware Looks the Same: Operating systems no longer have to search for drivers, etc — it’s all the exact same hardware being presented to the OS and the applications. Ease of management, deployment, portability.
- VMware Enhancements: The ability to move a process from one virtual environment to another while the process is running. (i.e., vMotion.)
- Disaster Recovery: Since every environment is the same, you can recreate or re-establish your environment anywhere.
- Snapshots: The ability to snap a picture of a computer environment — and restore to that state. CPU, memory, network, disk — just take a snapshot. It’s great for debugging because you can capture the complete state, persist to disk and reload from disk.
- Environment Duplication: You can clone the production environment to QA.
- More Efficient Resource Allocation: Since all of the physical resources are "virtualized" (or thin), you can play games behind the scenes. If you have 20 physical computers running at 50% CPU, with each computer having allocated 1 TB of disk (but only using 300GIG), you may be able to setup 40 virtual computers that look like the 20 physical. In addition, since the physical resources are behind the curtain you change the virtual to physical mappings (i.e., give more/less resources) as needed. Obviously, if all 40 virtual computers try to run at 100% of CPU and consume a TB of disk each, you have a major problem.
So, what are the cons?
The standard reaction to VMware is "OMG! There must be a huge impact in terms of performance." In reality the overhead of virtualization is quite low when it’s done properly. Is it 1%, 10%? Depends on the application and the underlying physical infrastructure.
Where it gets really strange — and thus the reason for this blog — is that some of our customers want to use VMware without the "virtual lie." If they have a virtual machine with 4 CPU cores and 16 GB of memory, they will instruct VMware to reserve (i.e., never overcommit) so that the virtual machine always has the underlying physical resources without sharing. They see benefits in having a common hardware platform, even if they aren’t going to share the resources.
So, if performance isn’t the problem, what is?
Things are different under VMware:
Imagine that you have an HP box with Quad CPU, Quad Core (i.e. 16 real cores) with 64 GB of memory.
In VMware, the biggest Virtual machine is an 8-way machine. You can’t have a 16-way machine.
Now, if you take that physical 16 core box and you create virtual machines as follows:
- Three 4-way(quad) machines, 8 GB apiece
- One 8-way machine, 32 GB of memory
In this case — we’ve allocated 20 CPU cores out of 16 real cores. If the Quad machines are all running and busy, but only 12 cores active, the 8-core virtual machine can’t run. Why? Because in order for the 8-core machine to run it needs to reserve 8 CPU resources, which it can’t. So it waits.
So, if VMware is inherently more efficient with smaller machines, what does this do to your world? What happens when you go from 5 physical machines to 20 virtual machines? How does this impact:
- SW License Costs (especially if the underlying OS is not free, like Windows)
- Management of more boxes (yes, even virtual boxes need to be managed)
- Configuration /Partitioning: Breaking up your data/problem onto a bigger swath of boxes
- External Components: What about something like SQL server? If you can’t get 16 cores of CPU for performance, how do you partition this?
And then the more subtle issues come about. If you are hooked on perfmon, those counters can be useless. Why? Because perfmon has a view of CPU on the virtual machine, not the physical machine. So, you need to look at the physical allocation — accessible on VMware through vSphere.
If your application has an issue running under VMware, how do you recreate it? Since the VMware host could be running many other applications, or you are sharing (i.e., overtaxing) disk or network resources, how will you know what’s your fault?
Looking past these issues, the next set of problems become interesting challenges and opportunities. For example, how would you design software for a vMark type of environment? In such a world, we should be able to spin-up/spin-down/allocate resources on demand.
Today, when we horizontally scale, we implement routers and health checking. We also setup pre-determined configurations on how to establish the policy of the routers. Data may belong to specific machines (a holdover on the idea that storage belongs to a machine). There’s a new world coming about how we dynamically use CPU/Disk/Network. Since everything is virtualized, any compute resource ought to be able to access any other compute resource, regardless of location. Spinning up another compute node should be as easy as saying "clone" — without having to install software or modify configs.


Comments