Storage

Blog Relocation

2007-11-09T16:00:00.000-06:00

Folks,

This is my last post on this blog as NetApp has graciously offered several of us the opportunity to use Typepad as the hosting service.

So, starting today I will be blogging using the new hosting service. The new Blog is titled Storage Nuts N' Bolts and so I hope to see you there.

Solaris 10 iSCSI configured with Dynamic Discovery

2007-10-22T11:56:00.000-05:00

Recently we went thru re-IPing all of our servers and storage arrays in our office. For the most part everything went fine with the exception of a Solaris 10 U3 server I was running iSCSI on.

After I got thru the steps of changing the server's IP address, gateway and DNS entries I rebooted the server. Upon reboot, I noticed a flurry of non-stop error messages at the server's console:

Sep 30 18:37:37 longhorn iscsi: [ID 286457 kern.notice] NOTICE: iscsi connection(8) unable to connect to target SENDTARGETS_DISCOVERY (errno:128)Sep 30 18:37:37 longhorn iscsi: [ID 114404 kern.notice] NOTICE: iscsi discovery failure - SendTargets (0xx.0xx.0xx.0xx)

As a result of this, I was never able to get a login prompt either at the console or via telnet even though I could succesfuly ping the server's new IP address. What the message above indicates is that the initiator issues a SendTargets and waits for the Target to respond with its Targets. To my surprise there's NO timeout and the initiator will try this process indefinately. In fact, just for kicks, I left it trying for an hour and 45'.

That also means that you will be locked out of the server, as attempting to boot into single user mode results in the exact same behavior.

To get around this problem you have 2 options even though option #2, for some, may not be an option.

Option 1
--------
a) Boot from a Solaris cdrom
b) mount /dev/dsk/c#t#d#s0 /a
c) cd /a/etc/iscsi
d) Remove or rename *.dbc and *.dbp files (iscsi not configured any longer)
e) Reboot the server
f) Use iscsiadm and configure the Solaris server with Static discovery (static-config) so you don't get into this situation again

Option 2
---------
a) Change back to the old Target IP address
b) That will enable you to reboot the server
c) Reconfigure the server to use static-config by specifying the target-name, new Target-ip-address and port-number
d) Change the Target IP address to the new one

I followed Option #1 because #2 was not really not an option for us. So the morale of the story is that you may want to consider static-discovery on Solaris with iSCSI.

VMware over NFS: Backup tricks...continued

2007-10-17T11:26:00.001-05:00

There have been a couple of questions on how to do file level backups of a Linux vmdk over NFS. I described the process for a Windows vmdk in a previous article here

In order to do this for a Linux vmdks you need to do the following:

Create a Flexclone of the NFS volume from a snapshot
Mount the flexclone to your linux server
Do not use the read-only mount option as linux requires read-write access
Specify -t ext3 as the mount option (you can get the FS type per partition by "df -T")
Remember to use fdisk -lu to get the starting sector for each partition
Multiply the starting sector x 512 bytes and specify the result in the "offset" field of the mount command

Here's an example to mount and explor a copy of the /boot partition of a Red Hat 4 U4 vmdk using a flexcloning:

One reader asked a good question regarding Windows. The question was how to do file level backups of partitioned windows vmdks? The answer to this lays in the offset parameter of the mount option

What you need to do in a scenario like this is:

Run msinfo32.exe in your Windows vm
Go to Components -> Storage -> Disks
Note the Partition Starting offsets and specify them as part of the mount option.

Demos from VMworld

2007-09-21T13:25:00.000-05:00

I promised last week to post some links to some of the demos we ran after VMworld was over. So for those who have not seen them here they are. There's audio as well so plug in your headsets.

1) VDI on Netapp over NFS

2) Eliminate duplicate data with A-SIS in a VMware environment

There are also several presentations and technical whitepapers at TechONTAP site which you may find very useful.

VMware on NFS: Backup Tricks

2007-09-11T17:02:00.001-05:00

Ok, so if you've decided to use VMware over NFS. Then there's always some guy who's find something to neatpick about and so he'll say "Well, can't run VCB on NFS". He's right but I don't see this as an issue? Sometimes it takes imagination to find a solution to a challenge.

Using NFS as a protocol on VMware you have similar choices and flexibility as with VCB and you can mount the NFS volume or a snapshot of the volume on a server other an ESX...Other = Linux in this case.

So if you are deploying VMware on NFS here's a way to backup whole VMDK images or files within VMDKs using Netapp Snapshots given that the Snapshots are accessible to the NFS client.

Mind you that with this approach you do all kinds of cool things and not just backups without impacting the ESX host. You can also restore, or you could also provision...

So here's the process:

1) Install the Linux NTFS driver if it's not already in your Linux built.

Note: For RHEL and Fedora installs click on the About RedHat/FC RPMs

2) Mount the export onto your linux server
# mount xx.xx.xx.xx:/vol/nfstest /mnt/vmnfs

So now you can backup VMDK images or you can drill into the .snapshot directory and back them up from there.

Next step is to backup files within VMDKs by accessing the snapshot...and you get to pick from which one. For this test, I select from the hourly.3 the snapshot named testsnap

3) Mount the VMDK as a loopback mount specifying the starting offset (32256) and NTFS file system type

# mount /mnt/nfstest/.snapshot/hourly.3/testsnap/nfs-flat.vmdk /mnt/vmdk -o ro,loop=/dev/loop2,offset=32256 -t ntfs

Here's your NTFS disk as seen from Linux:

# cd /mnt/vmdk
# ls -l

total 786844
dr-x------ 1 root root 0 Dec 19 03:03 013067c550e7cf93cc24
-r-------- 1 root root 0 Sep 11 2006 AUTOEXEC.BAT-
r-------- 1 root root 210 Dec 18 21:00 boot.ini
-r-------- 1 root root 0 Sep 11 2006 CONFIG.SYS
dr-x------ 1 root root 4096 Dec 18 21:10 Documents and Settings
-r-------- 1 root root 0 Sep 11 2006 IO.SYS
-r-------- 1 root root 0 Sep 11 2006 MSDOS.SYS
-r-------- 1 root root 47772 Mar 25 2005 NTDETECT.COM
-r-------- 1 root root 295536 Mar 25 2005 ntldr
-r-------- 1 root root 805306368 Mar 13 16:42 pagefile.sys
dr-x------ 1 root root 4096 Sep 11 2006 Program Files
dr-x------ 1 root root 0 Sep 11 2006 RECYCLER
dr-x------ 1 root root 0 Sep 11 2006 System Volume Information
dr-x------ 1 root root 0 Dec 19 00:35 tempd
r-x------ 1 root root 65536 Mar 13 17:41 WINDOWS
dr-x------ 1 root root 0 Sep 11 2006 wmpub

The nice thing about the loopback mount is that Linux will see a VMDK's content for any filesystem it recognizes...so now you can backup Windows and Linux VMs.

Here's a more indepth presentation on VMware over NFS including the backup trick from Peter Learmonth as well as a customer presentation from the VMworld breakout sessions. Login and passwords are proivided below:

user name: cbv_rep
password: cbvfor9v9r

Cheers

VMware over NFS

2007-09-07T19:21:00.000-05:00

My background is Fibre Channel and since 2003 I've followed iSCSI very closely. In fact, for years I have never paid much attention to other protocols until recently. For a long time I felt that FC was good for everything, which sounds weird if you consider who my employer is but then again, NetApp didn't hire me for my CIFS or NFS prowess. I was hired to drive adoption of NetApp's Fibre Channel and iSCSI offerings as well as the help prospects realize the virtues of a Unified Storage Architecture.

And speaking of Unified architectures leads me to VMware which represents to servers exactly what NetApp represents to storage. A Unified architecture with choices, flexibility, and centralized management without shoving a specific protocol down someones throat.

Close to 90% of the VI3 environments today are deployed over FC and of that %, based on experience, I'd say that 90% are using VMFS, VMware's clustered filesystem.

If you are dealing 2-3 clustered ESX, these types of deployments are not very complex. However, the complexity starts to increase exponentially as the number of servers in a VMware Datacenter start to multiply. RAID Groups, LUNs, LUN IDs, Zones, Zone management, HBAs, queue depths, VMFS Datastores, RDMs, multipathing settings etc.

Then the question comes up...VMFS LUNs or RDMs. How's my performance is going to be with 8-10 VMs on a VMFS LUN and a single Disk I/O queue? What if I take the RDM route and later one i run out of LUNs?

Way to many touch points, way too many things to pay attention to, way to many questions.

Well, there's help...NFS. I've recently had to the opportunity to play with NFS in my environment over VMware, and I can tell you, you are missing out if you at least do not consider it and test it for your environment.

Here's what I have found out with NFS and I'm not the only one:

Provisioning is a breeze
You get the advantage of VMDK thin Provisioning since it's the default setting over NFS
You can expand/decrease the NFS volume on the fly and realize the effect of the operation on the ESX server with the click of the datastore "refresh" button.
You don't have to deal with VMFS or RDMs so you have no dilemma here
No single disk I/O queue, so your performance is strictly dependent upon the size of the pipe and the disk array.
You don't have to deal with FC switches, zones, HBAs, and identical LUN IDs across ESX servers
You can restore (at least with NetApp you can), multiple VMs, individual VMs, or files within VMs.
You can instantaneously clone (NetApp Flexclone), a single VM, or multiple VMs
You can also backup whole VMs, or files within VMs

People may find this hard to believe, but the performance over NFS is actually better than FC or iSCSI not only in terms of throughtput but also in terms of latency. How can this be people ask? FC is 4Gb and Ethernet is 1Gb. I would say that this is a rather simplistic approach to performance. What folks don't realize is that:

ESX server I/O is small block and extremely random which means that bandwidth matters little. IOs and response time matter a lot.
You are not dealing with VMFS and a single managed disk I/O queue.
You can have a Single mount point across multiple IP addesses
You can use link aggregation IEEE 802.3ad (NetApp multimode VIF with IP aliases)

Given that server virtualization has incredible ramifications on storage in terms of storage capacity requirements, storage utilization and thus storage costs, I believe that the time where folks will warm up to NFS is closer than we think. With NFS you are thin provisioning by default and the VMDKs are thin as well. Plus any modification on the size of the NFS volume in terms of capacity is easily and immediately realized on the host side. Additionally, if you consider the fact that on average a VMFS volume is around 70-80% utilized (actually that maybe high) and the VMKD is around 70% you can easily conclude that your storage utilization is anywhere around from 49-56% excluding RAID overhead, then NFS starts to make a LOT of sense.

VMworld is next week and NetApp is a platinum sponsor. So, if you are attending, I would recommend you drop by Booth 701 and take a look at some incredibly exciting demos that have been put together showcasing the latest NetApp innovations with ESX server as well as VDI.

I'm hoping to be uploading the demo videos here next week or have links to them .

SnapDrive for Windows 5.0 - Thin Provisioning and Space Reclamation

2007-06-07T16:51:00.000-05:00

Back on July 11th 2006, I posted an article for Thin Provisioning. Today a reader made some very timely and appropriate comments around application support for thin provisioning and alerting and monitoring.

"I guess eventually OS and Apps have to start supporting thin provisioning, in terms of how they access the disk, and also in terms of instrumentation for monitoring and alerting"

Back in that article I had written that I would not deploy thin provisioning for new applications for which I had no usage metrics and for applications which would write, delete or quickly re-write data in a LUN. Here's why, up until now, I would avoid the latter scenario.

The example below attempts to illustrate this point.

Lets assume I have thinly provisioned a 100GB LUN to a Windows server.

I now fill in 50% of the LUN with data. Upon doing this, capacity utilization, from a filesystem standpoint is 50%, and from an array perspective is also 50%.

I then proceed to completely fill the LUN. Now, the filesystem and array capacity utilization are both at 100%.

Then I decide to delete 50% of the data in the LUN. What’s the filesystem and array capacity utilization? Folks are quick to reply that it’s at 50% but that is a partially correct answer. The correct answer is that filesystem utilization is at 50% but array utilization is still at 100%. The reason is that even though NTFS has freed some blocks upon deleting half of the data in the LUN, from an array perspective, these blocks still reside on the disk as there is no way for the array to know that the data is no longer needed.

So now, if more data is written in the LUN, there is no guarantee that the filesystem, will use the exact same blocks it freed previously to write the new data. That means that in a Thin Provisioning scenario, this behavior may trigger a storage allocation on the array when in fact such allocation may not be needed. So now, we’re back to square one in attempting to solve the exact same storage over-allocation challenge.

SnapDrive for Windows 5.0

With the introduction of SnapDrive for Windows 5.0, Network Appliance, introduced a feature called Space Reclamation.
The idea is to provide integration between NTFS and WAFL via a mechanism that will notify WAFL when NTFS has freed blocks so that WAFL, in turn, can reclaim these blocks and mark them as free.

Within SnapDrive 5.0 the space reclamation process can be initiated either via the GUI or the CLI. Upon initiating the space reclamation process, a pop-up window for the given LUN will inform the Administrator as to whether or not a space reclamation operation is needed, and if indeed, how much space will be reclaimed.

Additionally, the space reclamation process can be timed and the time window can be specified in minutes, 1-10080 minutes or 7 days, for the process to execute. Furthermore, there is no licensing requirement in order to utilize the Space Reclamation feature as it is bundled in with the recently released version of SnapDrive 5.0. However, the requirement is that the version of DataONTAP must be at 7.2.1 or later.

Performance is strictly dependent upon the number and the size of LUNs that are under the space reclamation process.

As a general rule, we recommend that the process is run during low I/O activity periods and when Snapshot operations such as snapshot create and snap restore are not in use.

While other competitors offer thin provisioning, Network Appliance, once again, has been the first to provide yet another innovative and important tool that helps our customers not only to safely deploy thin provisioning but also realize the benefits that derive from it.

SnapDrive for Unix: Self Service Storage Management

2006-11-22T15:38:00.000-06:00

I was reading an article recently regarding Storage provisioning, the article was titled “The Right Way to Provision Storage”. What I took away from the article, as reader, is that storage provisioning is a painful, time consuming process involving several people representing different groups.

The process according to the article pretty much goes like this:

Step 1: DBA determines performance requirement and number of LUNs and defers to the Storage person
Step 2: Storage person creates the LUN(s) and defers to the Operations person
Step 3: Operations person maps the LUN(s) to some initiator(s) and defers to the Server Admin
Step 4: Server Admin discovers the LUN(s) and creates the Filesystem(s). Then he/she informs the DBA, probably 3-4 days later, that his/her LUN(s) are ready to host the application

I wonder how many requests per week these folks get for storage provisioning and how much of their time it consumes. I would guess, much more than they would like. An IT director of a very large and well know Financial institution, a couple of years ago, told me “We get over a 400 storage provisioning requests a week and it has become very difficult to satisfy them all in a timely manner”.

Why does storage provisioning have to be so painful? It seems to me that one would get more joy out of getting a root canal than asking for storage to be provisioned. Storage provisioning should be a straight forward process and the folks who own the data (Application Admins) should be directly involved in the process.

In fact, they should be the ones doing the provisioning directly from the host under the watchful eye of the Storage group who will control the process by putting the necessary controls in place at the storage layer restricting the amount of storage Application admins can provision and the operations they are allowed to perform. This would be self-service storage provisioning and data management.

Dave Hitz on his blog, a few months back, described the above process and used the ATM analogy as example.

NetApp’s SnapDrive for Unix (Solaris/AIX/HP-UX/Linux) is similar to an ATM. It lets data application admins manage and provision storage for the data they own. Thru deep integration with various Logical Volume Managers, filesystem specific alls, SnapDrive for Unix allows administrators to do the following with a single host command:

1) Create LUNs on the array
2) Map the LUNs to host initiators
3) Discover the LUNs on the host
4) Create Disk Groups/Volume Groups
5) Create Logical Volumes
6) Create Filesystems
7) Add LUNs to Disk Group
8) Resize Storage
9) Create and Manage Snapshots
10) Recover from Snapshots
11) Connect to filesystems in Snapshots and mount them onto the same or a different Host the original filesystem was or still is mounted

The whole process is fast and more importantly very efficient. Furthermore, it masks the complexity of the various UNIX Logical Volume Managers and allows folks who are not intimately familiar with them to successfully perform various storage related tasks.

Additionally, SnapDrive for Unix provides snapshot consistency by making calls to filesystem specific freeze/thaw mechanisms providing image consistenty and the ability to successfully recover from a Snapshot.

Taking this a step further, SnapDrive for Unix provides the necessary controls at the storage layer and allows Storage administrators to specify who has access to what. For example, an administrator can specify any or a combination of the following access methods.

◆ NONE − The host has no access to the storage system.
◆ CREATE SNAP − The host can create snapshots.
◆ SNAP USE − The host can delete and rename snapshots.
◆ SNAP ALL− The host can create, restore, delete, and rename
snapshots.
◆ STORAGE CREATE DELETE − The host can create, resize,
and delete storage.
◆ STORAGE USE − The host can connect and disconnect storage.
◆ STORAGE ALL − The host can create, delete, connect, and
disconnect storage.
◆ ALL ACCESS− The host has access to all the SnapDrive for
UNIX operations.

Furthermore, SnapDrive for Unix is tightly integrated with NetApp’s SnapManager for Oracle product on several Unix platforms which allows application admins to manage Oracle specific Datasets. Currently, SnapDrive for Unix supports Fibre Channel, iSCSI and NFS.

SnapDrive for Unix uses HTTP/HTTPs as a transport protocol with password encryption and makes calls to DataONTAP’s APIs for storage management related tasks.

There’s also a widely deployed Windows version of SnapDrive that integrates with Microsoft’s Logical Disk Manager/NTFS and VSS and allows admins to perform similar tasks. Furthermore, SnapDrive for Windows is tightly integrated with NetApp’s SnapManager for Exchange and SnapManager for SQL products that allow administrators to obtain instantaneous backups and near-instantaneous restores of their Exchange or SQL server database(s).

Below are a couple of examples from my lab server of using SnapDrive for Unix of what it takes to provision Storage on a Solaris host with Veritas Foundation Suite installed.

Example 1:

In this example I’m creating 2 LUNs of 2GB size each on controller named filer-a on volume named /vol/boot.
The LUNs are named lun1 and lun2. I then create a Veritas disk group named dg1. On that disk group I create a Veritas volume named testvol. On volume testvol, I then create a filesystem with /test as the mount point. By default and unless instructed otherwise via a nopersist option, SnapDrive will also make an entries into the Solaris /etc/vfstab file.

The following is what the filesystem looks like and Veritas sees immidiately after the above process has completed:

Example 2:

Below, I obtain a snapshot of the Veritas filesystem and name the snapshot test_snap. I then make an inquiry to the array to obtain a list of consistent snapshots for my /test filesystem.

This reveals that I have taken 3 different snapshots at different points in time and I can recover from anyone of them. I can also connect to any one of them and mount the filesystem.

Example 3

Here I'm connecting to the filesystem from the most recent snapshot, test_snap, and i'm mounting a space optimized clone of the original filesystem at the time the snapshot was taken. Ultimately, I will end up with 2 copies of the filesystem.
The original one, named /test, and the one from the snapshot which I will rename /test_copy. Both of the filesystems are mounted on the same Solaris server (they don't have to be) and are under Veritas Volume Manager control.

This is how simple and easy it is to provision and manage storage using NetApp's SnapDrive. Franky, it seems to be that a lengthy process explaining the "proper" way to provision storage adds extra layers of human intervention, uncessary complexity, it's inefficient and time consuming.

The Emergence of OS Native Multipathing Solutions

2006-09-14T11:28:00.000-05:00

In today’s Business environment, High Availability is not an option. It is a business necessity and is essential in providing Business Continuity. Data is the lifeblood of a Business. Can you imagine a financial firm loosing connectivity to its Business Critical Database in the middle of the day?

This is where Multipathing or Path failover solutions can address High Availability and Business Continuity because not only do they eliminate single points of failure between the server and the storage but also help in achieving better performance by balancing the load (I/O load or LUN load) across multiple paths.

Most new servers bought today by customers connect into SANs. Furthermore, most of these servers have high availability and redundancy requirements thus are connecting to highly available, redundant fabrics and disk arrays. When any component in the data path fails, failover to the surviving data path occurs non-disruptively and automatically.
So the premise of Multipathing or path failover is to provide redundant server connections to storage and:

Provide path failover in the event of a path failure
Monitor I/O paths and provide alerts on critical events

Over the years, administrators have recognized this need and so after the purchase of a server, they would also purchase a 3rd party multipathing solution, typically from their storage vendor. Apart from the fact that these 3rd party solutions were not designed as part of Operating System and some did not integrate particularly well, in addition, they did not interoperate well with multipathing solutions from other storage vendors that needed to installed on the same server. In essence, storage vendor specific multipathing solutions solved one problem while creating another one. This problem has lasted for years and was addressed only recently.

Over the past 2-3 years a flurry of OS native multipathing solutions have emerged. Thus, today’s multipathing solution distribution model has changed drastically. Multipathing solutions can be distributed either as:

3rd party software (Symantec/Veritas DMP, PowerPath, HDLM, SDD, RDAC, SANPath etc).
Embedded in the Operating System (Solaris MPxIO, AIX MPIO, Windows MPIO, Linux Device Mapper-Multipath, HP-UX PVLinks, VMware ESX Server, Netware via NSS).
As an HBA vendor device driver that works with most, if not all, storage arrays (i.e Qlogic’s Linux/Netware failover driver, Windows QLDirect)
As an HBA vendor device driver (Emulex MultiPulse) available to OEMs only who in turn incorporate the technology into their own products via calls made to the HBA APIs provided by the HBA vendor.

Increasingly, the trend is toward the deployment of OS native multipathing solutions. In fact, with the exception of one Operating System, a substantial server/storage vendor has all but abandoned support of their traditional Multipathing solution for their newer storage arrays, in favor of the OS native ones.

There are two drivers behind this trend. Cost is one reason customers elect to deploy OS native multipathing solutions. After all, you can’t beat “free”. A secondary, but equally important, driver is to achieve better interoperability among various vendors’ storage devices that happen to provision the same server(s). One driver stack and one set of HBAs talks to everybody.

From a Windows standpoint, it is important to note that Microsoft is strongly encouraging all storage vendors to support its MPIO specification. Network Appliance supports this specification with a Device Specific Module (DSM) for our disk subsystems. It’s equally important to note that Windows MPIO enables the co-existence of multiple storage vendor DSMs within the same server. In fact, the current approach is similar to what Symantec/Veritas has done over the years with the Array Support Library (ASL) that provides vendor disk subsystem attributes and multipathing information to the Symantec/Veritas Device Discovery Layer (DDL) and Dynamic Multipathing (DMP) components.

Early last year Microsoft indicated they were considering the development of a Generic DSM for Fibre Channel (Generic DSM for iSCSI already exists) that will support all storage vendors as long as they (storage vendors) comply with the SCSI Primary Commands revision 3 (SCP-3). Furthermore, Microsoft, at the time, indicated that a Generic DSM would be incorporated into the Windows Vista release.

Network Appliance’s primary multipathing approach is to support all OS native multipathing solutions, as well as, support some popular 3rd party (i.e Symantec/VxDMP) solutions across all supported Operating Systems. Depending on customer demand, certification with disk array vendor specific multipathing solutions is always a possibility, assuming the necessary Customer Support Agreements are in place.

Installing RHEL on SATA using an Adaptec 1210SA Controller

2006-09-06T02:26:00.000-05:00

I have a Supermicro server in my lab with an Adaptec 1210SA controller connecting to a couple of SATA drives I use for testing. Given that Adaptec does not provide an RHEL driver, I've had a hard time installing the OS until I had an epiphany a week ago. Adaptec may not provide an RHEL driver for the 1210SA card they do provide a driver for the 2020SA card. Here's how I got around this little problem:

1) Got to the Adaptec site and download the RHEL driver for the 2020SA card.
2) Download and install the RAWWRITE binary for Windows

3) After downloading the RHEL package, unzip it, select the driver image based on the server's architecture, and use RAWWRITE to copy it into a floppy.

4) Power on the server, insert the RHEL CD #1 into the CDROM, and at the boot prompt type: linux dd

5) During the install you will be asked if you want to install additional drivers. Insert the Floppy and select "Yes".

At this point the driver will be loaded and then you can proceed with the OS installation.

I need to stress that this is not the recommended way of doing things but rather a workaround I use for Lab purposes only. I don't even use this system for demos. If you are considering placing such a server in production, I would highly recommend that you purchase a controller with support for the OS version you need to install.

VMware ESX 3.0.0 SAN Booting

2006-09-05T14:02:00.000-05:00

One of the ways enterprises today with large numbers of servers are reducing costs and enable greater storage consolidation is by deploying diskless servers that boot from the SAN (FC or IP). While this technique is not new, the introduction of the Bladeserver, which provides greater manageability, reduced HW costs, simpler cable management as well as providing power, cooling and real-estate savings, has further accelerate the adoption of SAN booting.

Booting from the SAN provides several advantages:

Disaster Recovery - Boot images stored on disk arrays can be easily replicated to remote sites where standby servers of the same HW type can boot quickly, minimizing the negative effect a disaster can have to the business.
Snapshots - Boot images in shapshots can be quickly reverted back to a point-in-time, saving time and money in rebuilding a server from scratch.
Quick deployment of Servers - Master Boot images stored on disk arrays can be easily cloned using Netapp's FlexClone capabilities providing rapid deployment of additional physical servers.
Centralized Management - Because the Master image is located in the SAN, upgrades and patches are managed centrally and are installed only on the master boot image which can be then cloned and mapped to the various servers. No more multiple upgrades or patch installs.
Greater Storage consolidation - Because the boot image resides in the SAN, there is no need to purchase internal drives.
Greater Protection - Disk arrays provide greater data protection, availability and resiliency features than servers. For example, Netapp's RAID-DP functionality provides additional protection in the event of a Dual drive failure. RAID-DP with SyncMirror, also protects against disk drive enclosure failure, Loop failure, cable failure, back-end HBA failure or any 4 concurrent drive failures

Having mentioned the advantages, it's only fair that we also mention the disadvantages which even though are being outnumbered they still exist:

Complexity - SAN Booting is a more complex process than booting from an internal drive. In certain cases, the troubleshooting process may be a bit more difficult especially if a coredump file can not be obtained.
Variable Requirements - The requirements and support from array vendor to array vendor will vary and specific configurations may not even be supported. The requirements will also vary based on the type of OS that is being loaded. Always consult with the disk array vendor before you decide to boot from the fabric.

One of the most popular platforms that lends itself to booting from the SAN is VMware ESX server 3.0.0. One reason is that VMware does not support booting from internal IDE or SATA drives. The second reason is that more and more enterprises have started to deploy ESX 3.0.0 on diskless blade servers consolidating hundreds of physical servers into few blades in a single blade chassis with the deployment of VMware's server virtualization capabilities.

The new ESX 3.0.0 release has made significant advancements in supporting boot from the SAN as the multiple and annoying requirements from the previous release have been addressed.

Here are some differences between the 2.5.x and 3.0.0 versions with regards to the SAN booting requirements:

If you are going to be booting ESX server from the SAN, I highly recommend that prior to making any HBA purchasing decisions, you contact your storage vendor and carefully review VMware's SAN Compatibility Guide for ESX server 3.0 . What you will find is that certain model Emulex and Qlogic HBAs are not supported for SAN booting as well as certain OEM'd/rebranded versions of Qlogic HBAs.

The setup process is rather trivial, however there are some things you will need to be aware of in order to achieve higher performance, and non-disruptive failovers should HW failures occur:

1) Enable the BIOS on only 1 HBA. You only need to enable the BIOS on the 2nd HBA should you have a need to reboot the server while either the original HBA used for booting purposes, the cable or the FC switch has failed. In this scenario, you would use Qlogic's Fast!UTIL to select the Active HBA, enable the BIOS, scan the BUS to discover the boot LUN, and assign the WWPN and LUN ID to the active HBA. However, when both HBA connections are functional only one needs to have its BIOS enabled.

2) One important option that needs to be modified is the Execution Throttle/Queue Depth which signifies the maximum number of Outstanding commands that can execute on anyone HBA port. The default for ESX 3.0.0 is 32. The value you use is dependent on a couple of factors:

Total Number of LUNs exposed thru the Array Target Port(s)
Array Target Port Queue Depth

The formula to determine the value is: Queue Depth = Target Queue Depth / Total number of LUNs mapped. This formula will guarantee that a fast load on every LUN will not flood the Target Port resulting in QFULL conditions. For example, if a Target Port has a queue depth of 1024 and 64 LUNs are exposed thru that port then the Queue Depth on each host should be set to 16. This is the safest approach and guarantees no QFULL conditions because 16 LUNs x 64 = Target Port Queue Depth

If using the same formula, you only consider LUNs mapped to one Host at a time then the potential for QFULL conditions exists. Using the above example, lets assume that we have a total of 64 LUNs and 4 ESX hosts each of which has 16 LUN mapped.

Then the calculation becomes: Queue Depth = 1024 / 16 = 64. But a fast load on all 64 LUNs produces: 64 x 64 = 4096 which is much greater than Queue Depth of the Physical Array Target Port. This will most certainly generate a QFULL condition.

As a rule of thumb, after the queue depth calculation, always allow some room for future expansion, in case more LUNs need to be created and mapped. Thus, consider setting the queue depth value a bit lower than the calculated one. How low is strictly dependent on future growth and requirements. As an alternative you could use Netapp's Dynamic Queue Depth Management solution which allows queue depth management from the array side rather than the host.

To Change the Queue Depth on a Qlogic HBA:

2a) Create a copy /etc/vmware/esx.conf

2b) Locate the following entry for each HBA:

/device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"

/device/002:02.0/options = ""

2c) Modify as following:

/device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"

/device/002:02.0/options = "ql2xmaxqdepth= xxx"

2d) Reboot

Where xxx is the queue depth value.

3) Another important option that will need modification using Fast!UTIL is the PortDownRetryCount parameter. This value will need to be set to the value recommended by your storage vendor. This setting specifies the number of times the adapter's driver retries a command to a port returning port down status. This value for ESX server is 2* n+5. Where n is the value of PortDownRetryCount from the HBA BIOS. You can change this value directly in the HBA or you can do it after you've installed ESX by editing the /etc/vmware/esx.conf file. Upon editing the file, locate the "options=" entry under the HBA model you are using and make the following change:

3a) Create a copy of /etc/vmware/esx.conf

3b) Locate the following entry for each HBA:

/device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
/device/002:02.0/options = ""

3c) Modify as following:

/device/002:02.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
/device/002:02.0/options = "qlport_down_retry= xxx"

3d) Reboot

Where xxx is the value recommended by your storage vendor. The equivalent setting for Emulex HBAs is "lpfc_nodedev_tmo". The default is 30".

In closing, before you decide what your setup will be, you will need to decide whether or not booting from the SAN makes sense for you and whether your storage vendor supports the configuration(s) you have in mind. In general, if you do not want to independently manage large server farms with internal drives, if you are deploying diskless blades or if you would like to take advantage of Disk array based snapshots and cloning techniques for rapid recovery and deployement then you are a candidate for SAN booting.

IBM Bladecenter iSCSI Boot Support

2006-09-05T11:08:00.000-05:00

There has been a lot of demand lately to boot blade servers using the integrated NICs without the use of iSCSI HBAs.

IBM has partnered with Microsoft to enable this capability for the IBM HS20 (Type 8843) Blades and Netapp has recently announced support for it.

Here are the requirements:

Blade type: HS20 MT8843
BIOS: 1.08
HS Blade Baseboard/Management Controller: 1.16
Windows 2003 SP1 w/ KB902113 Hot Fix
Microsoft iSCSI initiator with Intergrated boot support: 2.02
Netapp DataONTAP: >= 7.1.1
Netapp iSCSI Windows Initiator Support Kit 2.2 (available for download from the Netapp NOW site)

One thing to be aware of is that the Microsoft iSCSI initiator version 2.02 with Integrated Boot support is a different binary from the standard Microsoft iSCSI initiator 2.02.

To obtain the MS iSCSI initiator 2.02 with Boot support binary follow the link and provide the following invitation code: ms-8RR8-6k43

The IBM BIOS and BMC updates can be downloaded from here:
www-307.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-64042 or here

http://www-03.ibm.com/servers/eserver/support/bladecenter/hs20/downloadinghwwindows.html

You can find instructions for the process here:

ftp://ftp.software.ibm.com/pc/pccbbs/pc_servers_pdf/iscsi_boot_san_configuration_guide.pdf

Linux Native Multipathing (Device Mapper-Multipath)

2006-08-31T10:19:00.000-05:00

Over the past couple of years a flurry of OS Native multipathing solutions have become available. As a result we are seeing a trend towards these solutions and away from vendor specific multipathing software.

The latest OS Native multipathing solution is Device Mappper-Multipath (DM-Multipath) available with Red Hat Enterprise Linux 4.0 U2 and SuSE SLES 9.0 PS2.

I had the opportunity to configure it in my lab a couple of days ago and I was pleasantly surprised as to how easy was to configure it. Before I show how it's done, let me talk a little about how it works.

The multipathing layer sits above the protocols (FCP or iSCSI), and determines whether or not the devices discovered on the target, represent separate devices or whether they are just separate paths to the same device. In this case, Device Mapper (DM) is the multipathing layer for Linux.

To determine which SCSI devices/paths correspond to the same LUN, the DM initiates a SCSI Inquiry. The inquiry response, among other things, carries the LUN serial number. Regardless of the number paths a LUN is associated with, the serial number for the LUN will always be the same. This is how multipathing SW determines which and how many paths are associated with each LUN.

Before you get started you want to make a sure the following things are loaded:

device-mapper-1.01-1.6 RPM is loaded
multipath-tools-0.4.5-0.11
Netapp FCP Linux Host Utilities 3.0

Make a copy of the /etc/multipath.conf file. Edit the original file and make sure you only have the following entries uncommented out. If you don't have Netapp the section then add it.

defaults {
user_friendly_names yes
}
#
devnode_blacklist {
devnode "sd[a-b]$"
devnode "^(ramrawloopfdmddm-srscdst)[0-9]*"
devnode "^hd[a-z]"
devnode "^cciss!c[0-9]d[0-9]*"
}

devices {
device {
vendor "NETAPP "
product "LUN"
path_grouping_policy group_by_prio
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout"/opt/netapp/santools/mpath_prio_ontap /dev/n"
features "1 queue_if_no_path"
path_checker readsector0
failback immediate
}
}

The devnode_blacklist includes devices for which you do not want multipathing enabled. So if you have a couple of local SCSI drives (i.e sda and sdb) the first entry in the blacklist will exclude them. Same for IDE drives (hd).

Add the multipath service to the boot sequence by entering the following:

chkconfig --add multipathd
chkconfig multipathd on

Multipathing on Linux is Active/Active with a Round-Robin algorithm.

The path_grouping_policy is group_by_prio. It assigns paths into Path Groups based on path priority values. Each path is given a priority (high value = high priority) based on a callout program written by Netapp Engineering (part of the FCP Linux Host Utilities 3.0).

The priority values for each path in a Path Group are summed and you obtain a group priority value. The paths belonging to the Path Group with the higher priority value are used for I/O.

If a path fails, the value of the failed path is subtracted from the Path Group priority value. If the Path Group priority value is still higher than the values of the other Path Groups, I/O will continue within that Path Group. If not, I/O will switch to the Path Group with highest priority.

Create and map some LUNs to the host. If you are using the latest Qlogic or Emulex drivers, then run the respective utilities they provide to discover the LUN:

qla2xxx_lun_rescan all (QLogic)
lun_scan_all (Emulex)

To view a list of multipathed devices:

# multipath -d -l

[root@rhel-a ~]# multipath -l

360a9800043346461436f373279574b53
[size=5 GB][features="1 queue_if_no_path"][hwhandler="0"]
\_ round-robin 0 [active] \
\_ 2:0:0:0 sdc 8:32 [active]
\_ 3:0:0:0 sde 8:64 [active]
\_ round-robin 0 [enabled]
\_ 2:0:1:0 sdd 8:48 [active]
\_ 3:0:1:0 sdf 8:80 [active]

The above shows 1 LUN with 4 paths. Done. It's that easy to set up.

VMware ESX 3.0.0

2006-08-18T11:13:00.000-05:00

Over that past couple of i've started playing with the newly released version (3.0.0) of ESX Server. I've been running ESX 2.5.3 in my lab for a while now and so I decided to upgrade to 3.0.0 to get a feel of the new changes made. More importantly I wanted to see the iSCSI implementation.

I've been booting ESX 2.5.3 over an FC SAN in my lab and I have a few Windows 2003 virtual machines as well as a RHEL 4.0U2 virtual machine. The upgrade process took me about 30 minutes and was flawless.

Setting up the ESX iSCSI SW initiator was a breeze and after I was done I connected my existing VMs via iSCSI thru the ESX layer. Because there's no multipathing available for iSCSI as there is with Fibre Channel with the 3.0.0 release I used NIC Teaming to accomplish virtually the same thing. The whole process didn't take more than 10-15 minutes.

With the 3.0.0 version of ESX, VMware does not support booting ESX server over iSCSI, however, they do support VM's residing on iSCSI LUNs. Even though you could connect an iSCSI HBA (i.e Qlogic 4010/4050/4052) and boot the ESX server, the status of the iSCSI HBA for this release is deemed "experimental only". Support for the iSCSI HBA should be in the 3.0.1 release. I also hear that iSCSI multipathing support will also be available on this release as well.

So if you have a whole nunch of diskless blades you want to boot over iSCSI with VMware ESX you'll be able to get it done in the 3.0.1 release.

I also noticed that some of the restrictions in terms of suported FC HBAs for SAN booting have been lifted with the 3.0.0 release. For example, you can now use Emulex & Qlogic HBAs whereas before only Qlogic 23xx was supported. Additionally, RDMs (Raw Device Mappings) are now supported in conjuction with SAN booting whereas before they were not.

Further restrictions with regards to SAN booting that have been lifted, also include booting from the lowest number WWN, and lowest number LUN. The restriction that remains is that you can not boot ESX without a Fabric, meaning you can't boot ESX via a direct connection to a disk array. Well, I believe you can it's just that VMware won't support it.

One thing though that I have yet to figure out is why would VMware allow and support an ESX install on internal IDE/ATA drives but not on internal SATA drives. I've tried to install ESX on a server with an Adaptec 1210SA controller and during setup it couldn't find my disk. So it looks like a driver issue. Poking around on the net I found someone who used an LSI MegaRaid 150-2 controller and was successful in installing ESX on a SATA RAID 5 partition.

That made me curious so I spent $20 on Ebay and got an LSI Megaraid 150-2 controller and was successful in installing ESX. Like I said before, this is not supported by VMware which is bizarre but for testing purposes it works just fine.

One thing to watch out for is that:

VMware does not currently support MSCS with Windows 2003 SP1. SP1 has incorporated some changes that will not allow MSCS to fuction properly with any ESX version at this time. VMware has been working with Microsoft on a resolution but have no ETA for a fix

Back from vacation

2006-08-17T12:45:00.000-05:00

I haven't written for a while since my family and I went on vacation to Greece which is where I'm originally from. Always love to head on over there this time of the year and spend time with family and friends. My kids thoroughly enjoy the beaches and every summer they make new friends plus they get to learn the language.

The trip over was a breeze, however, the return coincided with the London events and even though we didn't travel thru London but rather thru Zurich we felt the pain.

For those of you that travel with small kids you know what I'm talking about, especially when you have to wait for over an hour to go thru security screening. It got even worse in NY where we had to sit for 3 1/2 hours on the tarmac. By the time we got to Dallas we needed another vacation.

At least we made back safely and that's what matters.

Thin Provisioning

2006-07-11T13:46:00.001-05:00

I've recently read several articles on Thin Provisioning and one thing that immediately jumped out at me was that each article describes Thin Provisioning as over-provisioning of existing physical storage capacity.

While this can be accomplished with Thin Provisioning, it's not necessarily the point of Thin provisioning. Thin Provisioning is also about intelligent allocation of existing physical capacity rather than over-allocation. If I have purchased 1TB of storage, and I know only a portion of it will be used initially, then I could thin provision LUNs totaling 1TB while on the back-end I do the physical allocation on application writes. There's no overallocation in this scheme and furthermore, I have the ability to re-purpose unallocated capacity if need be.

The big problem with storage allocation is that it's directly related to forecasting which is risky, at best. In carving up storage capacity too much maybe given to one group and not enough to another. The issue here is that storage re-allocation is difficult, it takes time, resources and in most cases, it requires application downtime. That's why most users request more capacity than they would typically need on day one. Thus capacity utilization becomes an issue.

Back to the overallocation scheme. In order to do overallocation you have to have 2 things in place to address the inherent risk associated with such practice and avoid getting calls at 3am.

1) A robust monitoring and alerting mechanism
2) Automated Policy Space Management

Without these Thinly provisioning represents a serious risk and and requires constant monitoring. That's why with DataONTAP 7.1 we have extended monitoring and alerting within the Operations Manager to include thinly provisioned volumes and also introduced automated Policy Space Management (vol autosize and snap autodelete).

Another thing I've just read is that when thin provisioning a windows lun, format will trigger physical allocation equal to the size of the LUN. That's not accurate and to prove that point I have created a 200MB Netapp Volume.
Furthermore, inside that Volume I have created a Thinly provisioned LUN (100MB) and mapped it to a windows server and formatted it. It's worth noting that the "Used" column of the Volume that hosts this particular LUN is 3MB, depicting overhead after the format, however, the LUN itself (/vol/test/mylun), as shown in the picture, is 100MB. Below is the LUN view from the server's perspective and further proof that the LUN is indeed formatted, (Drive E:\).

Personaly, I would not implement Thin Provisioning for new apps for which I have no usage patterns at all. I would also not implement it for applications that quickly delete chunks of data within the LUN(s) and write new data. Whenever you delete data on the host from a LUN, the disk array doesn’t know the data has been deleted. The host basically doesn’t tell - or rather SCSI doesn’t have a way to tell. Furthermore, when I delete xMB of data from a LUN, and write new data into it, NTFS can write this data anywhere. That means that some previously freed blocks maybe re-used but it also means that blocks never used before can also be touched. The latter will trigger a physical allocation on the array.

The State of Virtualization

2006-06-09T16:39:00.000-05:00

Storage Virtualization is the logical abstraction of physical storage devices enabling the management and presentation of multiple and disparate devices as a single storage pool, regardless of the device’s physical layout, and complexity.

As surprising as it may seem, Storage Virtualization is not a new concept and has existed for years within disk subsystems as well as on the hosts. For example RAID represents an example of virtualization achieved within RAID arrays in that it reduces disk management and administration of multiple physical disks into few virtual ones . Host based Logical Volume Managers (LVM) represent another example of a virtualization engine that’s been around for years and accomplishes tasks similar.

The promise of storage virtualization is to cut costs by reducing complexity, enabling better and more efficient capacity utilization, masking the inherent interoperability issues caused by the loose interpretation of the existing standards, and finally by providing an efficient way to manage large quantities of storage from disparate storage vendors.

The logical abstraction layer can reside in servers, intelligent FC switches, appliances or in the disk subsystem itself. These methods are commonly referred to as: host based, array based, controller based, appliance based and switch based virtualization. Additionally, each one of these methods is implemented differently by the various storage vendors and are sub-divided into two categories: in-band and out-of-band virtualization. Just to make things even more confusing, yet another terminology has surfaced over the past year or so, the split-path vs shared-path architectures. It is of no surprise that customers are confused and have been reluctant to adopt virtualization despite the promise of the technology.

So lets look at the different virtualization approaches and how they compare and contrast.

Host Based – Logical Volume Manager

LVMs have been around for years via 3rd party SW (i.e Symantec) or as part of the Operating System (i.e HP-UX, AIX, Solaris, Linux). They provide tasks such as disk partitioning, RAID protection, and striping. Some of the them also provide Dynamic Multipathing drivers (i.e Symantec Volume Manager). As it is typical with any software implementation the burden of processing falls squarely on the shoulders of the CPU, however these days the impact is much less pronounce due to the powerful CPUs available in the market. The overall performance of an LVM is very dependent on how efficient the Operating System is or how well 3rd party volume managers have been integrated with the OS. While LVMs are relatively simple to install, configure and use, they are server resident software, meaning that for large environments multiple installation, configuration instances will need to be performed as well multiple and repetitive management tasks will need to be performed.. An advantage of a host based LVM is independent of the physical characteristics of external disk subsystems and even though these may have various performance characteristics and complexities, the LVM can still handle and partition LUNs from all of them.

Disk Array Based

Similar to LVMs, disk arrays have been providing virtualization for years by implementing various RAID techniques. Such as creating Logical Units (LUNs) that span multiple disks in RAID Groups or across RAID Groups by partitioning the array disks into chunks and then re-assemble them into LUs. All this work is done by the disk array controller which is tightly integrated with the rest of the array components and provides cache memory, cache mirroring as well as interfaces that satisfy a variety of protocols (i.e FC, CIFS, NFS, iSCSI). These types of Disk arrays virtualize their own disks and do not necessarily provide attachments for virtualizing 3rd party external storage arrays, thus Disk Array virtualization differs from Storage Controller virtualization.

Storage Controller Based

Storage controller virtualization is similar to Disk array based in that they perform the exact same function with the difference being that they have the ability to connect to, and virtualize various 3rd party external storage arrays. An example of this would the Netapp V-Series. From that perspective the Storage controller has the widest view of the fabric in that it represents a central consolidation point for various resources dispersed within the fabric. All this, while still providing multiple interfaces that also satisfy different requirements (i.e CIFS, NFS, iSCSI).

Appliance Based

Fabric based virtualization comes in several flavors. It can be implemented, out-band within an intelligent switching platform using switching blades. It can also be implemented in-band using an external Appliance or out of-band using an Appliance. In-band is used as a means to denote the position of the virtualization engine relative to the data flow. In-band appliances tend to split the fabric in two providing a Host view on one side of the Fabric and a storage view on the other side. To the storage arrays, the Appliance appears as an Initiator (Host) establishing sessions and directing traffic between the hosts and the disk array. In-Band virtualization appliances send information in the form of metadata with regards to the location of the data using the same path as the one used to transport the data. This is referred to as a “Shared Path” architecture. The opposite is called “Split Path”. The theory is that separating the paths provides higher performance, however, there is no real world evidence presented to date that validates this point.

An out-of-band Appliance implementation separates the data path (thru the switch) from the control path (thru the appliance) and requires agents on each host that will maintain the mappings generated by the appliance. While the data path to the disk is shorter in this scheme, the installation and maintenance of host agents does place a burden on the administrator in terms of maintenance, management and OS concurrency.

Switch Based

Switch based virtualization requires the deployment of intelligent blades or line cards, that occupy slots in FC director class switches. One advantage they have is that these blades are tightly integrated with the switch port blades. On the other hand they do occupy director slots. These blades run virtualization SW primarily on Linux or Windows Operating systems. The performance of this solution is strictly dependent upon the performance of the blade since in reality the blade is nothing more than a server. However, there are blade implementations that utilize specialized ASICs to cope with any performance issues.

Conclusion

The current confusion in the market is partially created by the many implementation strategies as well as by “clear-as-mud” white papers and marketing materials regarding the various implementation methods. Regardless which method you choose to implement, testing it in your labs is the only way to find out if the solution’s worth the price.

VTL Part 2

2006-05-27T08:45:00.000-05:00

It's evident that VTLs are becoming popular backup and recovery targets. Among others, Netapp has also jumped onto the bandwagon, I figured I'd talk a little bit about the NearStore VTL offering.

A year ago Netapp announced the acquisition of Alacritus. At the time Alacritus was a privately held company out of Pleasanton, CA and Netapp first partnered with Alacritus around December 2004. Together they offered a solution comprised of a Netapp Nearline storage array and the Alacritus VTL package. Less than 6 mos later Netapp decided to own the technology so it acquired Alacritus.

Alacritus Background

As mentioned above, Alacritus was a privately held company. Alacritus has been in the VTL business and in the general backup business a lot longer than people think. The principals at Alacritus have been together for 15 years and are responsible for several backup innovations. They are the ones who with Netapp co-developed the Network Data Management Protocol (NDMP). They developed BudTool which was the first open systems backup application. They developed Celestra, which is the the first server-less backup product. They pioneered XCOPY, extended copy SCSI command. In 2001, Alactitus developed the 1st VTL and have been delivering it since then, before other VTL competitors were even incorporated. Alacritus strategy at the time was to sell the solution thru OEMs and resellers in Japan. Most notably Hitachi.

Technology

There are several technological innovations within the Netapp NearStore VTL delivering key benefits to customers but i'll only address 3-4 of them as I don't want to write an essay.

Continuous Self-tuning - The NearStor VTL continuously and dynamically load balances backup streams across *all* available resources (disks) thus maintaining optimal system performance without developing hot spots. That means that backup streams are load balanced across all the Disk Drives across all Raid Groups for a Virtual Library which in turn means that Virtual Tapes do not reside at fixed locations. That provides the ability to load balance traffic based on the most available drives. Utlimately, what this means is that customers do not have to take any steps to manually tune the VTL.

Smart Sizing - Smart sizing is based on the fact that all data compresses differently. Since data compresses at different rates, the amount of data that will fit into a tape changes from backup to backup. If you take into account that a Virtual Tape eventually will be written to a Physical Tape you want to make absolutely sure that the amount of data on the Virtual Tape will fit onto the Physical Tape. To address this, most VTL vendors make the capacity of the Virtual Tape equal to the Native capacity of the Physical Tape. The NearStor VTL offers a unique approach. By using high-speed statistical sampling of the backup stream, and by having knowledge of the Tape Drive's compression algorithm, it determines how well the data will compress when it gets to the Tape drive, and adjusts the size of the Virtual Tape accordingly to closely match the compressed capacity of the Physical Tape drive. As a result of this, customers obtain significantly higher physical media utilization rates compared to other VTLs. As an example, consider a backup of 400GB and a tape cartridge with a native capacity of 200GB. A typical VTL will need 2 Virtual Tapes each with a 200GB native capacity. If the Physical Drive compresses at 2:1 ratio that means that you'll write 200GB thus filling 1/2 of the tapes plus you'll need 2 Physical tapes to export to. With Smart Sizing, the Virtual Tape size will be adjusted to 1 Virtual Tape of a 400GB size. At a 2:1 drive compression ratio, you only need 1 Physical Tape of 200GB that will be fully utilized. The point is less cost by purchasing and managing less tapes.

Data Protection - There are 2 mechanisms that enable Data protection within the NearStore VTL. RAID and Hot Sparing is one. The second mechanism is called Journaled Object Store (JOS). All metadata is Journaled ensuring the data integrity of committed writes, even in the event of an unclean shutdown. Metadata is stored in multiple places and the data on each disk is self-describing. What that means is that in the event of a catastrophic failure where the appliance's metadata is completely lost, data that is still available on disk can be accessed. One thing of importance is other VTLs will lose all data if their metadata ever becomes inaccessible.

Pass-Thru Restores - When a physical tape is selected for a restore, it automatically gets imported as a virtual tape and data is copied in the background. However, if a specific file is requested that has not been copied to the virtual tape yet, the NearStore VTL will use a pass-thru mechanism, select the specific file from the physical tape and restore it. After the specific process has been completed, it will continue importing the rest of the image.

One thing that our customers find important is that Netapp owns the technology without 3rd party dependencies that control the development and provide 2nd or 3rd level support of the core technology.

VTL & Tape: A Symbiotic Relationship

2006-05-24T13:35:00.000-05:00

A lot has been written over the past year about advantages and disadvantages of tape. One thing for sure though is that Tape's not going anywhere anytime soon for various reasons some of which are included below:

Tape is deeply entrenched in the Enterprise
Tape's a cost effective long term storage medium
Backup applications understand Tape and perform their best when streaming to a Tape drive rather than a filesystem.
Tape can be easily moved offsite for vaulting purposes.

But Tape has some distinct disadvantages some of which include:

Tapes are unreliable and susceptible to environmental conditions (i.e heat/humidity etc).
You won't know of a bad tape until you attempt to recover from it.
Sharing Tape drives requires additional software and adds cost and complexity.
Streaming to a tape drive is not simple, especially with incremental backups. And while it can be done, via multiplexing, the latter has a significant effect on recovery since all interleaved streams must be read by the backup server.
In order to share Tape libraries between servers additional software must be purchased, adding cost as well as complexity.

One approach that customers have been using to address the above issues is to backup to a conventional disk array using D2D backup. However, what they find is that this approach adds additional configuration steps, in that they would still have to provision storage to the backup application using the disk vendors provisioning tools, still have to create RAID Groups, still have to create LUNs, still have to make decision regarding cache allocations and finally they still have to manage it.

Then, reality sets in...Disk is not easily shared between servers and Operating systems without a Shared SAN filesystem or by carving and managing multiple LUNs to multiple servers/apps. All this means additional cost, complexity and management overhead. Addressing a challenge by making it more challenging is not what people are looking for. This is where the VTL comes into play.

An integrated appliance with single or dual controllers and disk behind, that looks like, feels like tape but it's...Disk. Disk that Emulates Tape Libraries, with Tape drives, slots, Entry/Exit ports and Tape cartridges. Backup SW, since their inception were designed with Tape in mind, not disk. They know Tape, they perform very well with tape. They know little about disk and in some cases do not integrated at all with disk, nor do they perform optimally with disk.

The VTL on the other hand appears to the Backup SW as one or more Tape Libraries of different type and characteristics (drive type, slots #, capacities). They also eliminate the need to stream to disk regardless of the backup you are taking (full/incremental) since inherently disk is faster than tape. This also means that you don't have to multiplex thus making your recovery fast.

You could also easily share a single VTL among multiple servers providing each server with its own dedicated Tape library, drives, slots, robot. Essentially, what you end up is with a centrally located and manage Virtual Library that looks, feels and behaves as a dedicated physical library to each of your servers.

Another benefit of the VTL is that is easily integrated with a real Physical Tape library. In fact, the majority of the implementations require it by positioning the VTL in front of a Physical Tape library. The VTL will then emulate the specific tape library with its associated characteristics such as, number of drives, slots, barcodes, robot etc. After a backup has completed you then have 2 choices with regards to Physical Tape creation.

Traditional Physical Tape Creation Approach

Using this approach, the backup server is responsible for direct physical tape creation. In other words, the backup server controls the copy process as well as providing reporting capabilities incorporated into the backup sw. However, the backup server must process every tape twice which can increase the time required to create offsite tapes. Since the data path goes thru the backup server, this process will require specific windows that do not coincide with a regular backup windows. This method allows for the independent tracking of physical and virtual tapes but the process is slower from a performance perspective. Every VTL vendor supports this method.

VTL Direct Tape Creation Approach

Under this scenario, after the backup to the Virtual Tape is complete, the backup application will issue an eject to the virtual tape based on an aging policy. At this point, the Virtual Tape contents are copied to the Physical tape, in the background, using the same barcodes. Upon completion, the virtual tape is deleted from the virtual library. The benefit of this approach is that the backup server is not involved in the process. The requirement with this approach is that the VTL must be 100% compatible with the Backup application media management and be able to write the backup in the backup application's native format. Netapp's Nearstore VTL offers this approach as well as the Traditional Method while others offer one or the other.

There are many more useful features a VTL provides. One that I find extremely useful is the ability to create Shadow Tapes. What is a Shadow Tape?

When you export a Virtual Tape, in parallel with the creation of the Physical Tape, the VTL creates a shadow tape that is stored in a shadow vault. The backup application continues to manage the Physical tape while the shadow tape is invisible. If you later import the Physical Tape, the shadow Tape is moved form the vault into the library, which makes it available for reading immediately. The VTL manages the retention and expiration of shadow tapes.

VTLs are packed with many more features, some of which I'll be addressing in the next couple of days as a follow up to this writeup as well as give an overview of Netapp's Nearstore VTL story.

FlexVols: Flexible Data Management

2006-05-19T00:02:00.000-05:00

If you're managing Storage you're most likely to have experienced some of these issues. Too much storage is allocated and not used by some applications, while other apps are getting starved. Because application reconfiguration is not a trivial process and it's time and resource consuming, let alone it requires application downtime, most folks end up buying more disk.

The root of the problem with data management is that it relies heavily on forecasting and getting the forecast right all the time is an impossible task. Another issue with Data Management is that there are too many hidden costs associated with it. Costs that can include configuration changes, training, backup/restore, and data protection etc.

In addition, there's risk. Reconfigurations are risky in that they can potentially impact reliability. DataONTAP 7G with FlexVols addresses all of the above issues plus some more.

DataONTAP 7G virtualizes volumes in Netapp and Non-Netapp storage systems (V-Series) by creating an abstraction layer that separates the physical relationship between volumes and disks. A good analogy I read from a Clipper Group report was comparing capacity allocated to FlexVols versus other traditional approaches, to a wireless phone versus a landline. While every phone has a unique number the wireless phone can be used anywhere, whereas the landline resides in a fixed location and can not be moved easily.

FlexVols are created on top of a large pool of disks called an Aaggregate. You can have more than one aggregate if you want. Flexvols are stripped across every disk in the aggregate and have their own attributes which are independent of each other. For example, they can have their own snapshot schedule or their own replication schedule. They can also be increased or decreased in size on the fly. They also have another very important attribute. Space that is allocated to flexvol but not used can be taken away, on the fly, and re-allocated to another flexvol that needs it. The Aggregate(s) can also be increased in size on the fly.

Flexvols can also be cloned using our FlexClone technology which I'll address another day. But just so everyone understands, a Flexclone represents a space efficient point-in-time copy (read/write) of the parent Flexvol but can also be turned into a fully independent Flexvol itself.

Another important aspect of the flexvols is size granularity. Starting with a size of 20MB up to 16TB it gives users the ability to manage data sets according to their size while at the same time, obtain the performance of hundreds of disks. Couple that with DataONTAP's FlexShare, Class of Service, we have a very elegant solution for application consolidation within the same aggregate. By deploying 7G the days of wasting drive capacity in order to obtain performance are gone.

Another very useful feature of 7G is the ability to do Thin Provisioning as well provide Automated Policy Space Management in order to address unforseen events that can be caused by sudden spikes in used capacity.

I'll be writing more on the last two subjects pretty soon so stay tuned

The Kilo-Client Project: iSCSI for the Masses...

2006-05-11T19:49:00.000-05:00

A little bit over a year ago Netapp Engineering was challenged to build a large scale test bed in order to exploit and test various configurations and extreme conditions under which our products are deployed by our customers. Thus, the Kilo-Client project was born.

Completed, early 2006 the Kilo-Client project is, most likely, the World's Largest iSCSI SAN with 1,120 diskless blades booting of the SAN and providing support for various Operating Systems (Windows, Linux, Solaris) and multiple applications (Oracle, SAS, SAP etc). In addition, Kilo-Client, incorporates various Netapp technological innovations such as:

SnapShot - A disk based point in time copy
LUNClone - A a space optimized read/write LUN
FlexClone - A space optimized read/write Volume
SnapMirror - Replication of Volumes/qtrees/LUNs
Q-Tree - A logical container within a volume used to group files or LUNs.
SnapRestore - Near instantaneous recovery of a Volume or a LUN to a previous PIT version.

Today, not only, does the Kilo-Client project serves as an Engineering test bed but also as a facility where our customers can test their applications under a variety of scenarios and conditions. For more information on the Kilo-Client project click the link.

You may also want to consider registering for the Tech ONTAP Newsletter since there's ton of valuable information that gets posted on it on a monthly basis, from Best Practices, to new technology demos, tips/tricks and Engineering interviews.

iSCSI: Multipathing Options Menu

2006-05-10T20:34:00.000-05:00

A question that I get asked frequently revolves around iscsi multipathing options and how folks would be provide redundancy and be able to route I/O around various failed components residing in the data path.

Contrary to what has been available for Fibre Channel, iSCSI offers multiple choices to select from, each of which has various characteristics. So here are your optionsm most of which are available across all Operating systems that provide iSCSI support today:

1) Link Aggregation - IEEE 802.3ad

Link Aggegation, also known as Teaming or Trunking, is a well known and understood standard networking technique deployed to provide reduncancy and high-availability access for NFS, CIFS as well as other types of traffic. The premise is the ability to logically link multiple physical interfaces into a single interface thus providing redundancy, and higher availablity. Link aggregation is not dependent on storage but rather a capable Gigabit Ethernet driver.

4Gb FC Gains Momentum

2006-05-07T10:33:00.000-05:00

Various, next generation, 4Gb Fibre Channel components began rolling out around mid 2005 with moderate success rate, primarily because vendors were ahead of the adoption curve. A year later 4Gb FC has gained considerable momentum with almost every vendor having a 4Gb offering. With the available tools, infrastructure in place, backward compatibility, as well as, component availability near or at the same price points as 2Gb, 4Gb is a very well positioned technology.

The initial intention with 4Gb was for deployment inside the rack for connecting enclosures to controllers inside the array. However, initial deployments utilized 4Gb FC as Interswitch Links (ISL) in Edge to Core Fabrics or in topologies with considerably low traffic locality. For these types of environments 4Gb FC greatly increased performance, while at the same time decreasing ISL oversubscription ratios. Additionally, it decreased the number of trunks deployed which translates to lower switch port burn rates thus lowering the cost per port.

As metioned above, backwards compatibility is one of its advantages since 4Gb FC leverages the same 8B/10B encoding scheme as 1Gb/2Gb, speed negotiation, same cabling and SFPs. Incremental performance of 4Gb over 2Gb also allows for higher QoS for demanding applications and lower latency. Preserving existing investments in disk subsystems by being able to upgrade them to 4Gb thus avoiding fork-lift upgrades is an added bonus even though with some vendor offerings, fork-lift upgrades and subsequent data migrations will be necessary.

Even though most have 4Gb disk array offerings, no vendor that I know of offers 4Gb drives thus far, however I expect this to change. Inevitably, the question becomes "What good is a 4Gb FC front-end without 4Gb drives?"

With a 4Gb front-end you can still take advantage of cache (medical imaging, video rendering, data mining applications) and RAID parallelism provide excellent performance. There are some other benefits though, like higher fan-in ratios per Target Port thus lowering the number of switch ports needed. For servers and applications that deploy more than 2 HBAs, you have the ability to reduce the number of HBAs on the server, free server slots, and still get the same performance at a lower cost since the cost per 4Gb HBA is nearly identical with that of a 2Gb.

But what about disk drives? To date, there's one disk drive manufacturer with 4Gb drives on the market, Hitachi. Looking at the specs of a Hitachi Ultrastar 15K147 4Gb drive versus a Seagate ST3146854FC 2Gb drive, the interface speed is the major difference. Disk drive performance is primarily controlled by the Head Disk Assembly (HDA) via metrics such as avg. seek time, RPMs, transfer from media. Interface speed has little relevancy if there are no improvements in the above metrics. The bottom line is that, characterizing a disk drive as high performance strictly based on its interface speed can lead to the wrong conclusion.

Another thing to take into consideration, with regards to 4Gb drive adoption, is that most disk subsystem vendors source drives from multiple drive manufacturers in order to be able to provide the market with supply continuity. Mitigating against the risk of drive quality issues that could potentially occur with a particular drive manufacturer is another reason. I suspect that until we see 4Gb drive offerings from multiple disk drive vendors the current trend will continue

iSCSI Performance and Deployment

2006-05-03T19:29:00.000-05:00

With the popularity and proliferation of iSCSI, a lot of questions are being asked regarding iSCSI performance and when to consider deployment.

iSCSI performance is one of the most misunderstood aspects of the protocol. Looking at it purely from a bandwidth perspective, Fibre Channel at 2/4Gbit certainly appears much faster than iSCSI at 1Gbit. However, before we proceed further lets define two important terms: Bandwidth and Throughput

Bandwidth: The amount of data transferred over a specific time period. This is measured in KB/s, MB/s, GB/s

Throughput: The amount of work accomplished by the system over a specific time period. This is measured in IOPS (I/Os per second), TPS (transactions per second)

There is a significant difference between the two in that Throughput has varying I/O sizes which have a direct effect on Bandwidth. Consider an application that requires 5000 IOPS at a 4k block size. That translates to a bandwidth of 20MB/s. Now consider the same application but at a 64k size. That's a bandwidth of 320MB/s.

Is there any doubt as to whether or not iSCSI is capable of supporting a 5000 IOP, 20MB/s application? How about at 5000 IOPs and 40MB/s using a SQL server 8k page size?

Naturally, as the I/O size increases the interconnect with the smaller bandwidth will become a bottleneck sooner than the interconnect with the larger one. So, I/O size and application requirements makes a big difference as to when to consider an iSCSI deployment.

If you are dealing with bandwidth intensive applications such as backup, video/audio streaming, large block sequential I/O Data Warehouse Databases, iSCSI is probably not the right fit, at this time.

Tests that we have performed internally, as well, as tests performed by 3rd party independent organizations such as the Enterprise Storage Group confirm that iSCSI performance difference between FC and iSCSI is negligible when deployed with small block OLTP type applications. Having said that, there are also documented tests conducted by a 3rd party independent organization, Veritest, where iSCSI outperformed an equivalent array identically configured with FC using Best Practices documentation deployed by both vendors, in conjuction with an OLTP type of workload.

At the end of the day, always remember that the application requirements dictate protocol deployment.

Another question that gets asked frequently is whether or not iSCSI is ready for mission critical applications.

iSCSI has come a long way since 2003. The introduction of host-side clustering, multipathing support and SAN booting capabilities from various OS and storage vendors provide a vote of confidence that iSCSI can certainly be considered for mission critical applications. Additionally, based on deployments, Netapp has proven over the past 3 years, that a scalable, simple to use array with Enterprise class reliability when coupled with the above mentioned features can safely be the iSCSI platform for mission-critical applications. Exchange is a perfect example of a mission critical application (it is considered as such by lots of Enterprises) that is routinely deployed over iSCSI these days.

Dynamic Queue Management

2006-05-02T12:49:00.000-05:00

When we (Netapp) rolled out Fibre Channel support almost 4 years ago, one of our goals was to simplify the installation, configuration, data and protocol management as well as provide deep application integration. In short, we wanted to make sure the burden does not fall squarely on the shoulder of the Administrator to accomplish routine day to day tasks.

One of the things we paid particularly attention to, was Host side and Target side Queue Depth management. Setting host Queue depths is a much more complicated process than the various disk subsystem vendors documentation make it to be and requires specific knowledge around application throughput and response times in order to decide what the appropriate Host Queue Depth should be set to.

All SAN devices suffer from Queue Depth related issues. The issue is that everybody parcels out finite resources (Queues) from a common source (Array Target Port) to a set of Initiators (HBAs) that consider these resources to be independent. As a result, on occasion, initiators can easily monopolize I/O to a Target Port thus starving other initiators in the Fabric.

Every vendor documentation I've seen, explicitly specifies what the host setting of the Host Queue Depth setting should be. How is that possible when in order to do this you need to have knowledge of the application's specific I/O requirements and response rime? Isn't that what Little's Law is all about (N=X * R)?

It's simply a "shot in the dark" approach hoping that the assigned queue depth will provide adequate application performance. But what if it doesn't? Well, then, a lot of vendors will give it another go...Another "shot in the dark". In the process of setting the appropriate Host Queue Depth, and depending on the OS, they will edit the appropriate configuration file, make the change, and ask the application admin to take an outage and reboot the host.

The above procedure is related to two things: a) Poor Planning without knowing what the Application requirements are b) Inadequate protocol management features

To address this challenge we decided to implement Dynamic Queue Management and move Queue Depth management from the Host to the Array's Target Port.

So what is Dynamic Queue Management?

Simply put, Dynamic Queue Management manages queue depths from the Array side. By monitoring Application response times on a per LUN basis, and QFULL conditions it dynamically adjusts the Queue Depth based on the application requirements. In addition, it can be configured to:

Limit the number of I/O requests a certain Initiator sends to a Target Port
Prevent initiators from flooding Target ports while starving other initiators from LUN access
Ensures that initiators have guaranteed access to Queue resources

With Dynamic Queue Management, Data ONTAP calculates the total amount of command blocks available and allocates the appropriate number to reserve for an initiator or a group of initiators, based on the percentage you specify (0-99%). You can also specify a reserve Queue Pool where an initiator can borrow Queues when these are needed by the application. On the host side, we set the Queue Depth to its maximum value.

The benefit of this practice is, that it take the guessing game out of the picture and guarantees that the application will perform at its maximum level without unnecessary host side reconfigurations, application shutdowns or host reboots. Look Ma', No Hands!!!

Several of our competitors claim that we're new to the FC SAN market. While I will not disagree, I will augment that statement by saying that we're also...wiser and we've addressed challenges in a 4 year span that others haven't since 1997. After all, there's nothing mystical or cryptic about implementing a protocol that's been around for several years.