2011-05-23

Storage vMotion - A Deep-Dive

Of all the features available with vSphere - one of the greatest features I like is Storage vMotion, which is described by VMware as follows:

image

In simple terms, vMotion allows you to move your VM from one host to another, Storage vMotion allows you to move your VM's between different Storage arrays / LUNS that presented to you ESX Host. All without downtime (ok, one or two pings.. ).

Updated info vSphere 4.x here

I was looking to understand more on how this exactly works - so I looked up Kit Colbert's Session from VMworld 2009 (ancient, I know but still a great source of information)

Borrowing some slides from Kit's presentation we will try and understand a bit more.

Slide_1

And how does this work?

Slide_2

Slide_3

Slide_4

That is all nice and fine - and now for a look under the covers - to see exactly what is happening.

So we start a Storage vMotion of a VM named deb1 from vsa1_vol_1 to vsa1_vol_2

Before Migration

The task starts running as you can see in vCenter tasks.

Start Task

But the real "magic" is happening on the Host itself.

Opening a vSphere client session to the host itself, we will see a new VM that is created.

New Machine_1

New Machine_2

Just before the the SvMotion is completed you will see that both machines co-exist for a short amount of time and both are powered on

New Machine_3

The switch is made and the old one is powered off and removed.

New Machine_4

And from an esxtop perspective. Here you can see that there is one VM with an ID of 175523.

esxtop_1

Start the SvMotion and there are two VM's.

esxtop_2

SvMotion completes and only the new VM with its new ID (175722) remains.

esxtop_3

And the machine is now running from vsa1_vol_2.

Migration complete

And that is how sVmotion works.

**Update**

After receiving a message on Twitter from Emré Celebi with the following text,

image

and also a comment from Duncan Epping, I realised that the information I posted was pertaining to ESX 3.5 and not 4.x.

So here is the correct technical document for 4.x.

So what changed? CBT is now used to tracks the changes between the start of the process and the last stage just before the switch over. A good explanation on CBT here by Eric Siebert.

In this great session from Ali Mashtizadeh and Emre Celebi I learnt more about the process and how it now works in 4.x.

VMworld Session

What are the differences?

Slide_5

Here is the process.

Slide_6

So How does the Changed Block Tracking come into play?

Slide_7

In 4.x VMware introduced the Data Mover which can also offload the Storage operations to the Storage Vendor with VAAI.

Slide_8

Slide_9

Slide_10

Comparing the Old and the New

Old - 3.5 New - 4.x

In 4.x This is the process

  1. Start Storage vMotion
  2. Flag the disk, and start CBT checkpoint.
  3. Start pre-copy of the disk to destination in multiple iterations.
  4. Check which block have changed cince the check point and copy only those remaining blocks and use Fast Suspend/Resume for the switch over.
  5. Delete original.

So how does this change performance-wise as compared to 3.5? As you can see below the performance gain is substantial both in ESX CPU cycles used during the process and the time needed for the process.

Slide_12

Just one last thing regarding troubleshooting.

Slide_13

The information in the VMworld Session goes in to more detail than I have done here - so I highly advise anyone who would like to understand the process in-depth - listen/watch the full session. It is free and an hour well spent.

The screenshots posted above showing the process are from a 4.x environment so they reflect the updated method.

Thanks again to  Emré Celebi and Duncan Epping.