- Provisioning is a breeze
- You get the advantage of VMDK thin Provisioning since it's the default setting over NFS
- You can expand/decrease the NFS volume on the fly and realize the effect of the operation on the ESX server with the click of the datastore "refresh" button.
- You don't have to deal with VMFS or RDMs so you have no dilemma here
- No single disk I/O queue, so your performance is strictly dependent upon the size of the pipe and the disk array.
- You don't have to deal with FC switches, zones, HBAs, and identical LUN IDs across ESX servers
- You can restore (at least with NetApp you can), multiple VMs, individual VMs, or files within VMs.
- You can instantaneously clone (NetApp Flexclone), a single VM, or multiple VMs
- You can also backup whole VMs, or files within VMs
People may find this hard to believe, but the performance over NFS is actually better than FC or iSCSI not only in terms of throughtput but also in terms of latency. How can this be people ask? FC is 4Gb and Ethernet is 1Gb. I would say that this is a rather simplistic approach to performance. What folks don't realize is that:
- ESX server I/O is small block and extremely random which means that bandwidth matters little. IOs and response time matter a lot.
- You are not dealing with VMFS and a single managed disk I/O queue.
- You can have a Single mount point across multiple IP addesses
- You can use link aggregation IEEE 802.3ad (NetApp multimode VIF with IP aliases)
Given that server virtualization has incredible ramifications on storage in terms of storage capacity requirements, storage utilization and thus storage costs, I believe that the time where folks will warm up to NFS is closer than we think. With NFS you are thin provisioning by default and the VMDKs are thin as well. Plus any modification on the size of the NFS volume in terms of capacity is easily and immediately realized on the host side. Additionally, if you consider the fact that on average a VMFS volume is around 70-80% utilized (actually that maybe high) and the VMKD is around 70% you can easily conclude that your storage utilization is anywhere around from 49-56% excluding RAID overhead, then NFS starts to make a LOT of sense.
VMworld is next week and NetApp is a platinum sponsor. So, if you are attending, I would recommend you drop by Booth 701 and take a look at some incredibly exciting demos that have been put together showcasing the latest NetApp innovations with ESX server as well as VDI.
I'm hoping to be uploading the demo videos here next week or have links to them .
48 comments:
Finally... People are catching on to the NFS secret... We have been using Netapp 3050 and 3070's for a year now to host over 800 VMs across 15 ESX hosts. We snapmirror the VM volumes to a R200 for backup and use ASIS with 21 days of snapshots We also snapmirror VMs to DR.
Everyone sees FC as the "cool" new technology, but they forget that NFS has been around for 25 years.
You hit the nail on the head with this blog... Latency is the key to fast VM's.
I see DANFS as the next big thing... Direct Attached NFS (0ver 10Gb). Dedicate a 10Gb switch for NFS and direct attach your NFS server to all your ESX hosts
NFS Simple. Glad to see other finally see the light.
Dan
Dan,
Thanks for the informative post. Speaking of A-SIS, what kind of space savings do you see in your environment?
Given, the amount of duplicate blocks among VMs in terms of the OS, binaries, libraries, drivers etc, you should be getting some pretty substantial space savings. And what I mean by substantial, is above 60%, and that maybe a conservative number.
Thanks
Nick, thanks for the NFS plug.
I also have pointer to the VMware/NFS presentation on my blog.
http://nfsworld.blogspot.com/2007/09/vmware-over-nfs.html
Regards,
Mike Eisler.
We see well over 50% reduction of our VM volumes on the R200 using ASIS.
We use the space savings to store 21+ days of snapshots which gives us the ability to restore full VMs from any day in the past 3 weeks... Just like tapes... But the restore (time in minutes) is just a NFS copy of the VM files...
ASIS takes some thought up front on volume type, sizing, snaphots and snapmirrors which limits the real gains, however Aggregate level ASIS (ONTAP 7.3??) should resolve all these issues.
Now if ONTAP could do thin provisioning like some of the other vendors like Compellent...
BTW: I did several Benchmark Factory test using 0racle 10g over NFS compared to Oracle 10g over FC within a linux VM. The results were amost the same. What made a 2x difference was the ESX hosts. Summary, buy FAST ESX hosts with lots of memory.
This sounds intriguing, but certainly counter-intuitive.
Given that you work for NetApp, it is unlikely that you can share performance testing information.
Has anyone else out there compared VMDK over FC to SAN vs. VMDK over NFS to NAS?
Is port bonding / grouping required to put the two on equal footing?
Hi,
No I can not publish internal tests, however, we do have a Proof of concept lab where customers can come in test their own configs and produce results for their workloads. They basically tells us the setup they want and the servers with the configs they need. The Lab folks set it up and they givem the "keys" to test.
I have not seen a test comparing VMDKs over FC vs VMDKs over NFS but I have seen a whitepaper w/ Exchange on a Physical server vs a VM measuring the virtualization overhead, performance as well as scalability.
http://www.vmware.com/pdf/Virtualizing_Exchange2003.pdf
Beyond that nothing else really. I suspect it wont be a simple thing to compare apples to apples given that most vendors offer different platforms for pretty much protocol which also run different microcode which may scew the results one way or another.
Yes, you will need NIC Teaming, a switch that supports 802.3ad, VIFs (Link aggr) on the array side.
I believe NFS will find acceptance in a lot of VMware environments (exception - those with MSCS requirements) but it will be a gradual uptake. I also believe that it'll get rooted first in VDI environments which have completely different requirements than the typical VI3 deployments we know today.
@dan: what do you mean if they get it? Netapp HAS thin provisioning and had it long before compellent...
One question I've is with NFS Data Store, can block oriented applications such as SQL Server, Exchange Server be run inside Windows Server guests? If so, do you know of any of your customers that has tried doing so and what are the gotchas?
Great question. So because I don't want to rehash something that has already been posted elsewhere I will prompt you to the latest of Dave Hitz's blog entry which addresses this specific question at the end of the article.
http://blogs.netapp.com/dave/
PS. The limiting factor is MSCS. So if you have a requirement for MSCS you will need to run over a block protocol.
Nick,
Is there anything that you loose by using NFS as opposed to VMFS? I'm thinking about things like VMware snapshots, or expanding VMDK files, etc....
I know netapp has snapshots (best in the industry, if you ask me), but the folks who use virtual center aren't the ones who manage our filers, and we're not able to give them access. They'd need to be able to use snapshots from the VC console.
thanks!
Eric
What about CIFS? The netapp NFS license is really expensive, but we already own CIFS.
thanks!
Hi Eric,
Yes, you can still use VMware's snapshot if you were to use NFS and you can still expand the vmdk using vmkfstools the same way you'd expand if it were sitting on a VMFS datastore.
Sorry eric I didn't provide a complete answer...
Is there anything you lose by not using VMFS?
You'd lose the ability to do MSCS (V2V or P2V) and you can't use VCB but for the latter there's a workaround described on my blog (VMware over NFS backup tricks) which provides you not only with ability to backup and recover vmdks and files from array snapshots but you also get to provision from a snapshot if you wanted to.
Plus you get to the added benefit to name it whatever you want...like LCB :-) like Linux Consolidate Backup
Hi,
CIFS is not an ESX supported protocol which means that VMs can't run over CIFS. Having said that, you can still use CIFS for user data (i.e home dirs). You will need to go outside the ESX hypervisor and mount CIFS shares directly from the array. This is actually a great approach for VDI. You can use array snapshots of the home dirs, expose them to the users and have them restore their own files (drag & drop procedure).
Nick,
How big of an issue is non-support
of MSCS or based on your experience what class of VMWare sites use or are likely to use MSCS?
Thanks.
How big of an issue is non-support of MSCS?
The answer is depends. It depends on your application requirements and any Service Level Agreements tied to these apps/servers.
Ironically, for smaller shops MSCS maybe more of a requirement than larger ones simply because smaller shops can and do consolidate mission critical servers/apps which their business depends on. I met with a customer this week that he has consolidated all of his Windows servers (~105). All of the SQL DBs and Exchange are running inside VMs. So to these folks MSCS is important.
In larger environments server consolidation, thus far, has been centered around moving a lot of non-mission critical, non-business critical DAS servers. So, the requirements here are typically much more relaxed. So for these folks, things like VMware HA, or the ability to deploy a VM clone within a few minutes from a template, or restore from a snapshot is good enough.
Nick- can you explain the single managed disk I/O queue and how it relates to the performance of VMFS?
Thanks!
Sorry for the late response but I've been out of the country.
To get the best performance on a shared VMFS volume, there are some things that need to be considered.
a) Adjusting the max queue depth size on the adapter (i.e ql2xmaxqdepth for qlogic HBAs)
b) The maximum number of outstanding requests per VM (default 16).
c) The max number of VMDKs per Datastore
The greater the number of outstanding requests the better the performance, up to a point. In theory, you can queue up to 256 IOs but reality shows that above 64 the benefit is pretty much non-existent.
So every time you create a LUN, lay VMFS on top of it and create your VMDKs, those 64 IOs are shared among ALL the VMDKs (16 ios per).
If we assume that each VMDK will do I/O with the same characteristics at the same time with the same intensity, it is quite possible you will queue more I/Os than the adapter can handle which will translate into high latencies.
Ideally, what you want is:
Adapter Queue depth <= Number of VMs on LUN * 16
Every time folks put above 4 VMs on a Datastore with an adapter queue depth of 64 and the deafult of 16 IOs per VM, what they are essentially doing is to oversubscribe or thin provision the adapter's queue depth. Sometime it works, sometimes it doesn't. It all depends on what the VMs are doing and the intensity. This can also have backup ramifications as additional I/O load is generated to the LUN on top of the normal traffic and it may make it difficult to complete a backup while regular IO is also flowing to the LUN from other VMDKs.
Question about the Link Aggregation on NFS. Will this improve performance or is it only for fail over? My understanding is that the NFS client on ESX's side will always have the same IP and would therefore always use the same "route", so even a 4-channel LAG would only run at 1Gb.
Hi,
Hi for a single VM, single datastore, single ESX, NIC teaming does not buy much besides failover.
VMware has 3 load balancing mechanisms:
a) Route on originating vPort ID
b) Route based on IP hash
c) Route based on MAC
What you can do is on a single ESX host is this:
1) Set IP Hash routing on the vswitch
2) Set src-dest-ip balancing on the physical network switch
3) Create a multimode VIF (make sure the ethernet switch supports Link Aggregation) on the array and use IP aliasing by placing multiple IP addresses on the VIF.
4) Create multiple datastores using the same NFS mount point but use a different IP address to mount from (IP aliasing).
You'll get some really nice numbers.
In a dumb down config, with a single ESX server, FC will outperform NFS for I/O to a single VM in a single datastore. I highlight the word *single*...
However, as you start scaling out your ESX cluster you get a very different result and NFS will outperfm FC when doing I/O to multiple VMs residing in a datastore shared across multiple ESX servers.
NFS does scale better and that's why HPC environments tend to favor NFS over anything else.
All,
Thanks for theses infos nick
So can I consider to :
- Create a two level VIF on the netapp side with one VIF in single mode based on N VIFs in multi mode. So I get redundancy and N Gigabits througput.
- Create N-1 IP aliases for the top level VIF and N DNS entries for the netapp
- Mount each NFS datastore with one of the DNS entries.
On the other hand, can someone tell me the number of VM a 1 Gb/s link can support. I know it depends of the load, but on average.
I ask because we use blade servers with 16Gb of RAM. There is 8 servers in an enclosure with 4 Gigabit ports for all of them from the internal switches.
Thanks
ML
Mike,
question. How do you provision VMs in an NFS setup using flexclones? How do you run sysprep, normally using vmfs and virtual center, this is done automatically for you. Is there a way to automate this? If so how?
We're looking at NFS now for our VDI project and have had netappp in here talking it up. We have always used VMFS datastores and everything is straight forward. The storage admins don't have to get involved once the disk has been carved out. Wouldn't we lose that independence? Does this add an extra level of complexity when troubleshooting I/O disk issues etc etc?
thanks for all the answers
Virtuel,
Here's how you'd do it with FlexClone.
1) You'd create a Flexvol that will be used as your Golden Datastore.
2) You will begin by building a Template of the VM(s) to deploy. The image(s) will be already syspreped.
3) Use VMware's cloning to fill-in the Datastore with however many vmdks you plan to deploy
4) Take a Snapshot of the Golden Datastore
5) Start creating FlexClones based on the snapshot on an as needed basis
As far as involvment from the storage folks go there 2 options that I can think of:
1) Get them engaged for the flexcloning cloning process or
2) Have a script running from a host (possibly VC) that does that for you as well as it creates snapshots on the volume(s).
As far as growing a volume, you'd have the storage guys use Vol autosize on the volume and specify the max size the volume can grow to, and the increments by which it can grow. So every time you hit the 98% utilization threshold (default) the volume will automatically expand by the specified increment. You can also decrement the size of the nfs volume/datastore - on-the-fly. Both incrementing/decrementing do not require any action on the Host side contrary to an iSCSI or an FCP implementation. Actually you can't decrement in this scenario.
Although, it may seem that you maybe delegating some control when it comes to deploy cloning, you gain it elsewhere (i.e you don't have to do anything to grow an NFS datastore with vol autosize enabled).
Having said that, regardless of the protocol and the array deployed, data cloning techniques, snapshots or recovery techniques (datastore or vmdk level) do require some interaction with the array but that can be automated via scripting.
I don't know if you've played with NFS on ESX but from my viewpoint and with background (block protocols -FC/iSCSI) it was a eye-opening experience. I trully believe it's a great fit and what's more interesting is that people have started to take a hard look at it.
Hi. I just went to the VMWARE config class and the instructor advise against VMDK over NFS, he believes the VM swap activities will not be efficient with NFS, resulting slow VM performance. However, my Netapp reps keep pushing for NFS. Even one of my application vender (Perforce) does not advice to use NFS for the Perforce DB and journal files. I'm not sure which is the best route to go at this point. Thanks for your advice. My setup: 2970 (to be dual quad) with Netapp 3070 backend. The question is: VMDK over NFS or iSCSI? If I go with iSCSI, is SNAP drivers needed on individual VMs?
Hi,
I found this : "the voice of EMC".
http://france.emc.com/techlib/pdf/H2756_using_emc_celerra_ip_stor_vmware_infra_wp_ldv.pdf
In particular :
page 21 : VMFS is an excellent choice along with NFS where various virtual machine Luns don't require specific IO performance
page 33 : VMware has executed a test suite that achieved similar performance between the two protocols
And, if I could get an answer to my previous question, it would be great ...
Best regards
ML
Great comment ML. So here's my question.
Assuming VMFS provides the same performance over LUNs to NFS, but NFS provides more flexibility, easier management and provisioning vs FC or iSCSI and considerably less complexity to recover at the individual vmdk level, why should I implement FC or iSCSI?
The issue with LUNs at least im my mind from a performance perspective is SCSI reservations and VMFS metadata updates. There is a section in one of the VMware documents that speaks as to what happens to a LUN when VMFS needs to update its metadata. You may want to do a search on the VMTN for SCSI reservations and see what happens when the same LUNs is shares across large clusters.
Sorry don't recall your previous question but if you can remind me I'll try to answer it, if I can.
Hi eric,
Your instructor is correct. VMware does recommend (since VMworld 2006) to place the pagefiles/vwsp on a LUN in NFS implementation. That would mean that you could create an iSCSI LUN (if you're implementing NFS) and put that stuff in there.
Seperating this stuff, is also advantageous if you are taking snapshots so you don't snap pagefile junk nor replicate it.
Frankly, if you're paging you got bigger problems to solve than the location of the pagefile. But in any case, his recommendation is similar to what we've heard as well.
Nick,
Exactly ! If nfs fits, why try to use iSCSI.
Besides, in general, the IT guys know FC SAN and NFS. iSCSI is new for them, and new stuffs scare....
If you could answer the questions I asked on Tuesday, October 30, 2007, in this thread, it would be great (2 level Vif, number of DNS entries, number of VMs per Gigabit link on average.....)
Eric,
I think, the RAM of ESX server should not be overcommited. Theses swap files are used only if ESX is out of RAM.
I don't know, even if ESX don't use them, if they change or not (reinit for instance). If not, no disk space would be used because of snapshots.
As ESX can share RAM blocks between VM, it could be usefull to specialize ESX server for Windows 2003 VMs and ESX server for Linux VMs.
Some of our VMs use less than 100 Mb with this feature.
Best regards
ML
For the question about the number of VM per Gigabit link, I forgot to say that I plan to store application's data of my VMs mostly in NFS or CIFS shares on the storage.
I mean, the share will be mounted directly from the VM, to store data more efficiently and avoid a virtualization layer for datas.
So I imagine the vmdk access will occur mostly at boot time. Once the OS and the app is loaded, there is not so much disk access.
Thanks
Regards
"Create a two level VIF on the netapp side with one VIF in single mode based on N VIFs in multi mode. So I get redundancy and N Gigabits througput.
- Create N-1 IP aliases for the top level VIF and N DNS entries for the netapp
- Mount each NFS datastore with one of the DNS entries."
Do you really need my help on this one? You already have it down? :-)
Number of VMs on a 1Gb link...I'm gonna give the answer people hate the most, myself included...It depends on the workload, but from what I've seen the I/O profile is small blog random. Small block = 4k/8k block size. In this case what you care is not bandwidth but rather IOPS and latency. So if all VMs were to collectivelly push 8k IOPs (very doubtful) your bandwidth requirement would be 8k x 8k blk size = 64MBps. 8k IOPs on a single physical server is a lot of IOPs and you'd either have to be running some extremely heavy duty stuff to get there or have a ton of VMs.
What you want to do is run perfmon and characterize the workload, in terms of reads/writes and start logging for at least 24hrs. The disk transfers/sec is the sum or reads/writes (IOPs). You also want Disk reads/sec, Disk writes/sec
Are there any guidelines for volume size, number of vms per volume when using NFS? I understand some of the answer will depend on my environment.
I asked out NetApp TAM about this and he was suggesting 1 vm per volume. With the maximum mounted exports per host being 32 this isn't practical.
Maybe he's still thinking in FC/ISCSI lun terms?
Parker,
No. One VM per volume is not right at all. In fact this number of not even right with FC or iSCSI.
What we've seen internally and from customer deployments is that the fan-in ratio of VMs to an NFS volume is larger than VMs to a LUN.
With NFS Volumes we're recommending in the range of 35-50, even though we believe the limit is much higher. In fact, someone told me they were using 80. However, I can not confirm the latter. So take that with a grain of salt.
Thanks for the quick response.
One other question. The network configuration on my filers for ISCSI and NFS is two single vifs on each filer head. We're not using IP trunking. The hosts are connected via two HP switches that are using HP meshing. This consists of 2 mesh ports on each switch connected to the mesh ports on it's partner switch. This provides fail over and I believe load balancing if a switch is overloaded. The vif ports are split between the two switches and this works for fail over in testing. If I pull one of the lines at the filer or the switch it fails over to the other switch very quickly.
I have two one gig connections on each filer that can be targeted by an ESX host. Most of the VMWAre storage is located one filer however. Should I consider re configuring these switches for IP trunking?
Also I am considering replacing the HP switches. We have two seperate sets of switches one for VMWare and one for Oracle on NFS on another set of filers. I was thinking of replacing these with larger switches and seperating the traffic with vlans. Any recommendations on switches?
Thanks again for your assistance. We could use your input on the VMWare discussion boards!
Parker,
First of all, I don't recommend anyone changing anything unless there's a problem looking for a solution.
Having said that, a couple of things to keep in mind if you decide to reconfigure:
1) ESX server supports IEEE 802.3ad static
2) ESX server supports link aggregattion on a single or stackable switches but not on disparate trunked switches.
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1001938&sliceId=1&docTypeID=DT_KB_1_1&dialogID=31483476&stateId=0%200%2031481367
With respect to (1) I've seen some folks having a hard time configuring 802.3ad static on some Procurve swithes. I believe the default trunk type on the procurve is LACP dynamic which VMware does not support. I would imaging you'd be able to create a static trunk but i don't know. I also know Procurve supports FEC (Fast Etherchannel which uses Cisco's proprieraty PagP implementation. ESX does not support that either.
But assuming you resolve the above, you still have to deal with ESX support of link aggregation across disparate switches assuming this what you have now.
So, if that's true, if It were me, I'd stay as I am right now, unless I was about to upgrade the switches. Then I'd look at the stackable ones and I would turn on Link Aggregation.
As far as switch recommendations, but I'd look at the Cisco 3000 series. I'd also look at the Extreme Summit X series.
I know about the Procurve dynamic LACP. I got bit by it when our regular network switches were Procurve. Apparently HP made dynamic the default setting on their switches at some point. They later realized it was a bad idea. They can be changed to static. You're right I don't have a problem right now. I'll have to keep an eye on things as we grow. I do also realize that trunking across switches isn't supported by VMWare. With the HP switches I am dealing with you can actually define trunks between two switches. I don't know if they are then considered to be stacked though. Thanks for the recommendations.
Hi Nick,
Is there any specific reason for not using MSCS on NFS Datastores on ESX server 3.x?
I see not much of a difference between VMFS filesystem & NFS mounted filesystem as long as the MSCS VMs uses the virtual disk files (not RDMs).
Phani
For cluster in a box that ought to work since you have 2 options:
a) RDM in virtual comp. mode
b) virtual disk.
No idea as to why (b) is not supported. No support for iSCSI either in both cases.
For clustering across boxes there's only one option and that's RDM in physical or virtual comp. mode so NFS is not an option here and neither is iSCSI. The latter is a little surprising. It maybe they haven't tested thouroughly.
Hi Nick,
Thanks for your response. So far i was debating with my colleagues that MSCS should work on NFS datastores as well (within ESX & accross ESX). But now, your reply gave me a hint that why MSCS not possible on NFS Datastores when the MSCS VMs are accross ESX server 3.x
On ESX 2.5.x (or earlier), to configure MSCS accross ESX boxes, we need to change the VMFS volume access mode to 'shared'. Now in ESX 3.x, there is no concept of 'shared' mode at all. Hence the only one option is 'RDM'. As 'RDM' creation is not possible on NFS datastores, no support for MSCS on NFS datastores accross ESX 3.x servers.
However the MSCS should work on NFS Datastore within the ESX 3.x server.
Have a good day !
Thanks & Regards,
Phani
Hello Nick,
thanks for your good post. I move 12 of our 24 active VM's from iScsi to NFS now. We'll move the rest on Wendsday.
Our I/O requierements are not really high. So I chose the flexibility.
Since our Team is responsible for storage, esx and everything else. We don't care were we make changes, as long as they are easy:-).
Most of our VM's are linux systems and since I'm using lvm we have no Problem with thin or thick disk files in the first place, but I guess it could make my life a bit easier.
So Thanks again for your post, since it made my decision a bit easier, too. :-)
Markus
Marcus,
Just replied to your email. Check it out and let me know if you need anything else.
Nick
I do have a question: Does the NFS recommendation apply more to NetApp's implementation of NFS with WAFL underneath, or is it a more generic recommendation?
I understand WAFL has a lot of enhancements to make NFS perform well and the snapshots are great, but if someone wants to deploy ESX "on the cheap" (say by using a Linux box configured as an NFS server) or another vendor's NAS, would that not work as well?
Thx
D
Hi,
I can't speak about any other NFS implementations regarding performance. However, some of the other benefits like provisioning and ease of use are ubiquitous regardless of the implementation.
The other thing you need to keep in mind, is that regardless of the protocol you choose to implement, the solution must be on VMware's support matrix...
Anybody know of a white paper that shows this in detail or comparisons of Fiber Channel vs. NFS.
Since I have a NetApp and I have access to both protocols, when should I use FC and when should I use NFS? Would a Database work better of NFS?
Ding Ding Ding...the key to NFS on ESX is NETAPP NFS. The WAFL file system is what makes the performance of FC or iSCSI similar, and scale better.
Pity Storage vMotion and Site Recovery Manager are only supported on block and not NFS....
guess that blows a hole in the NFS theory
It is true that block protocols have gotten priority over NFS as for supporting features. The main reason has really been adoption rates. FC has the largest share of the market and was what VMware supported first.
Having said that, VMware says that things will change and there will be balance.
We'll see
Storage vmotion isn't supported on NFS? That's news to me since I've done it quite a bit with NFS.
VMWare has told me they are targeting support for NFS on recovery manager for Q1 2009. I have also heard that it can be done now even though it isn't officialy supported.
Hi Parker,
Officially Storage VMotion is not supported on NFS but that does not mean it doesn't work. Per VMware the reason has been lack of QA cycles.
I can not comment on VMware's Roadmap
Post a Comment