Simple IT made complex

Dryerr's blog

Archive for the ‘VMWare ESX’ Category

vCPU performance degradation

with one comment

This post will lightly touch the subject of vCPU performance degradation, when using multiple vCPU in a VM. This is not a scientific test, it’s just meant to help make decisions about how many vCPU you should use ūüôā¬†

How is the test made?

It’s pretty simple. I¬†got the program Super Pi, and for the single thread test, I ran one occurence of the program. For the multi thread test, I ran 10. All tests were made with the 2M calculation. The test is run on the same ESX host, which is in a production environment, in a low load period. Tests are run multiple times over the course of 3 hours, to adjust somewhat for inaccurancy in case of some sudden loads.¬†

The numbers, how should I read them?

Well, the number of vCPU is pretty self-explanatory. The number at the end of the bar though, it’s the average finish time (lets call that AFT from now on) of the threads. In the single thread test, that’s simply the average of how long Super Pi took to finish. In the multi thread test, it’s all the 10 finish times added together, then divided by 10.¬†

Single thread

Let me warn you, this first test is slightly weird, and I’m still unable to explain it. But have a look at the numbers first, then we’ll dive into it.¬†

 

1 vCPU takes 38,3 seconds to finish. Okay, noted, no problem.
2 vCPU takes 46,1 seconds to finish, hmm, that’s quite a performance hit, roughly 20%. 4 vCPU must be horrible then!
4 vCPU takes 39,3 seconds… Wait, what? That’s only 2% slower than 1 vCPU.¬†

These numbers are consistent, I ran this test on the same ESX host every 15-20min over 3 hours, and they kept ending up this way. If anyone reading has the sligthest idea why this happens, I would love to be enlightened. 

Anyway, this shows a slight performance degradation from 1 vCPU to 4 vCPU, 2% roughly. This means, that if your environment is able to reserve the cores needed to run a 4 vCPU VM, it wont really perform worse when it’s not multi threading.¬†

Multi thread

This is mainly why I made the test, to clarify how much multi threading you actually gain from more than one vCPU. Like the last time, take a look at the numbers. 

 

Notice how the 2 vCPU performs how you would expect when multi threading. 

1 vCPU is 10 times slower to calculate 10 threads than 1, makes sense.
2 vCPU is ~49,4% faster than 1 vCPU, pretty good deal.
4 vCPU is ~73,4% faster than 1 vCPU,  ~47,4% faster than 2 vCPU. Not as good a deal, but still pretty good. 

Putting it together

I’ve put the two graphs in the same chart, simply because I could, which gives me the opportunity to post one more. Behold!¬†

 

A final round of numbers. This is the average finish time per thread. The calculation is (AFT * Number of vCPU) / Number of threads. The number is in seconds.

 Single thread
1 vCPU: 38,3
2 vCPU: 46,1 (20,3% slower than 1 vCPU)
4 vCPU: 39,3 (2,6% slower than 1 vCPU) 

Multi thread
1 vCPU: 38,6 (0,08% percent slower than single thread)
2 vCPU: 39,1 (17,9% faster than 2 vCPU single thread, 5,1% faster than 4 vCPU multi thread)
4 vCPU: 41,1 (4,6% slower than 4 vCPU single thread) 

A note: I did also do the test with 8 vCPU, but because the ESX host only has 8 cores, it’s pretty hard for the hypervisor to reserve the CPU cores, and the tests were horrible ūüôā¬†

What does this tell us? Well, if you’re not running a terminal server, examine if you need to use more than 1 vCPU. CPU resources aren’t the most expensive in a virtual environment, but if you just pop 4 vCPU in every VM you create, you will run into a CPU limit very fast.¬†Imagine you have a ESX who can have¬†100 VMs with 1 vCPU¬†(fictional numbers here), you might instead¬†only be able to support 50-70 with 2 vCPU, or 10-20 with 4 vCPU. These numbers aren’t real, but what I’m trying to say, is that the curve is exponential, meaning you will loose more than you gain.¬†

As said, the above is not a fact sheet, but it should help you make decisions. Sure you might loose some extra CPU by creating a 4 vCPU VM¬†instead of 2 VMs with 2 vCPU each. On the other hand, you might be able to save RAM and storage by just creating a single 4 vCPU VM, which might be a better choice for you, because those 2 resources might cost you more than CPU. Usually we run out of RAM before anything else ūüôā

Written by dryerr

April 2, 2010 at 22:45

Posted in VMWare ESX

Tagged with ,

Reading the CPU ready time

leave a comment »

I was just messing around with the tags in the previous post, and noticed only one other person has a tag called CPU ready, which led me to a very interesting blog.

First of all, this is an excellent post about reading the CPU ready time in the performance tab. http://www.vmdamentals.com/?p=44

In general, the blog contains a lot of useful information.
http://www.vmdamentals.com

Written by dryerr

March 27, 2010 at 15:12

Posted in VMWare ESX

Tagged with ,

How vCPU works on the ESX host

with 9 comments

Update: The vCPU benchmark can be found here: https://dryerr.wordpress.com/2010/04/02/vcpu-performance-degradation/

Having recently made a simple vCPU benchmark, I thought about posting the results here. But for those numbers, you really need to understand how vCPU works on the ESX host, or more important, how multiple vCPU works on the ESX host.

First, lets look at how a normal CPU works in a physical environment
The term CPU cycle is the key here. A CPU cycle happens with every Hz, so a 2GHz CPU can do up to 2 billion CPU cycles each second. If you then add another CPU to the mix, you can now do up to 4 billion CPU cycles per second.
So far, pretty straightforward, but, with 2 CPUs, each cycle now happens twice, once on CPU0 and once on CPU1. This doesn’t mean that both CPUs will do some work, the cycle might do something demanding for CPU0, but be empty for CPU1, leaving a free CPU for more work.

So what’s the interesting part here? Both CPUs must run a cycle at the same time. This is probably done to avoid synchronization problems or something similiar.

Now, in a virtual environment
Lets say you have a VM, this VM has 1 vCPU. With every cycle, the vCPU asks the hypervisor for some real CPU to do the work. The hypervisor then tries to reserve a physical CPU to take care of the cycle.
Now your VM has 2 vCPU, what happens then, is that the VM asks the hypervisor to do 2 CPU cycles,¬†and the hypervisor then tries to reserve 2 physical CPUs to do the cycles. This is all very good and well, but it presents a problem. It’s harder to reserve 2 physical CPUs than¬†1 (oh really?).

For fun, imagine you have an ESX server with 8 physical CPUs available. You also have¬†5 VMs with 1 vCPU each. In this case, every VM shouldn’t really fight for CPU, as there is always one available. Now add another¬†2 VMs with 2 vCPU each, a total of¬†7 VMs, and¬†9 vCPUs. All the VMs with 2 vCPU will have a harder time reserving CPU time, because the 1 vCPU VMs can jump in at any free CPU. Imagine¬†all the VMs is actively using a lot of CPU. This requires 9 physical CPUs, but we only have 8, so the ESX does what it’s best at, queuing the work and making our virtual environment doing it’s thing. However, the VMs with 2 vCPU will have a hard time reserving CPU time, because everytime a CPU frees up, a single vCPU jumps in and grabs it. I’m pretty sure ESX has a great way to reduce this kind of grabbing, but it does matter. A lot.

If you look on the performance charts on the VM, you have a graph called “CPU Ready”. I find the title a bit confusing, because what it means is, that’s the amount of time the vCPU was ready to do work, but couldn’t get time on a physical CPU. You can’t avoid some CPU ready delay, not when you’re doing virtual environments.

Written by dryerr

March 27, 2010 at 11:14

Posted in VMWare ESX

Tagged with , ,