Tag Archives: storage

Bakbone and Falconstor

In breaking news, Qwest has purchased Bakbone for $55m and stock options. Seeing as we have quite a lot of clients locally running Netvault, it will be interesting to see how this purchase affects the product lineup going forward. Buyouts can sometimes work, and other times fall down badly. Let’s try and be optimistic about this one. I’ll be chatting to local contacts as well as Bakbone UK to see what the story is.

On another note, one of my fav companies, Falconstor, is tying up with HP to provide any to any array replication. Considering tie ups with IBM and LSI Logic, Falconstor are starting to gain quite a bit of traction in the market.

Virtualisation part 3: VMware backup scenarios

Backups in a virtualisation environment take on a whole new meaning, typically complex ( as opposed to the simple outlook that the vm vendors would like to portray )  because now you are dealing with shared SAN storage, vm images instead of files, very specific requirements around backup hardware and setup, 3rd party backup agents and multiple methods for backup from VMware themselves. So much for simplifying your IT infrastructure. But it’s not all doom and gloom – I’ll try and break things down so VMware backup is not as dark as it appears to be.

VCB

VMware provides a Backup/DR API in the form of VMware Consolidated Backup ( VCB ) which allows you to either do backups with the VMware service console tools or integrate with a 3rd party backup tool. Doing backups via the service console is not for the faint of heart so most VMware users will go the 3rd party route. This is done by installing VCB and the 3rd party backup server on a machine ( vm or physical ), configuring VCB to talk to VCenter or ESX, and then setting up your backups via the 3rd party tool’s VMware agent/plugin.

VCB works by generating a snapshot of a vm and then storing it in a specified location on the VCB proxy machine. VCB proxy can run on a physical machine or in a vm. If running in a vm then you need to make sure you have sufficient space to store the vm images that are generated at backup time – take 3 to 4x the space of the largest vm you have as a rule of thumb.

VCB Modes

There are 3 main modes of backup using VCB proxy depending on your hardware setup/design. Let ‘s take a look at each of these in turn and the pros and cons.

SAN mode

ESX needs to have its VMFS stored on shared FC SAN storage or iSCSI shared disk. Backups are offloaded to a physical VCB proxy which is also connected to the shared storage. In this mode, the LUNs exposed to the ESX servers need to be exposed to the VCB proxy machine as well.

SCSI Hot-add mode

In the SCSI Hot?Add mode, you set up one of your virtual machines as a VCB proxy and use it to back up other virtual machines residing on storage visible to the ESX Server that hosts the VCB proxy virtual machine. This mode eliminates the need of having a dedicated physical machine for your VCB proxy and does not require you to expose SAN LUNs to the Windows VCB proxy.

In this mode, you can use Consolidated Backup to protect any virtual disks on any type of storage available to your ESX Server host, including NAS or local storage. The only exception is that it does not back up any disks of the virtual machine that has any independent disk, Physical Compatibility RDM, or IDE (This is applicable to ESX 4 and ESXi 4.)

Consolidated Backup creates a snapshot of the virtual disk to be protected and hot adds the snapshot to the VCB proxy, allowing it to access virtual machine disk data. The VCB proxy reads the data through the I/O stack of the ESX host. If the ESX servers only have local disk then you need a VCB proxy on each ESX host.

LAN ( NBD ) mode

In this mode, Consolidated Backup uses an over-the-network protocol to access the virtual disk. The ESX Server host reads the data from the storage device and sends it across a network channel to the VCB proxy. A limitation is that vm’s should not be over 1TB in size.

You can also use a vm for this mode to host VCB proxy however remember the issues relating to tape connectivity. A separate ‘backup’ network can be used between a physical VCB proxy and the ESX hosts to split normal and backup traffic.

Types of backups

There are also a number of various ways of doing the backup:

  • image level where one backups up the vm and all its associated files
  • file-level ( via VCB or 3rd party agent ) where you backup the contents of the vm at the file level; this can be combined with full and incremental/differential backups ( note that file-level backups via VCB are only supported on Windows platforms )

In addition you can also load a backup agent in the vm and treat it as a normal backup client. Most likely you will use a combination of image-level VCB backup with agent-based file-level backup. Note that VCB can not be used to backup clustered vms and you will need to use 3rd party cluster aware backup tools for this scenario.

Tape connectivity/VTL

Tape can be connected in a variety of ways for the purposes of VMware backup, mostly depending on the storage mechanism you are using with VMware.

  • connect the tape device to a standalone VCB proxy using FC, SAS or SCSI – this gives you the flexibility of breaking backups out of your VMware space
  • connect the device to your FC fabric if you make use of shared storage for ESX VMFS
  • connect a tape device to an ESX host using pass-through SCSI

In addition, you can use a 2 stage backup solution with disk-2-disk ( d2d ) as the first stage and tape as the 2nd. Using VTL is quite useful as it allows you to send more data flows to the backup server than you have physical tape drives. Tape devices are expensive and as such you may only have one or 2. With VTL, you can simulate as many drives as you’d like and send that amount of flows concurrently to the backup server. You need to be reasonable with this of course especially if you are using Ethernet for your transport.

VMware Data Recovery

DR is a new product with VSphere 4 that does VCB-style backups via the VCenter console and provides for de-duplication of data. There are some fairly severe restrictions with DR, so it’s possibly a tool that you will use in conjunction with VCB and a 3rd party tool.

DR is only a d2d tool so you will need disk in your SAN ( you can also use NAS or iSCSI ) to store the backup data, as well no tape devices are supported. It always de-dupes – you have no option in this regard. DR is an appliance that runs as a vm and provides what is essentially incremental forever backups. You can have up to 8 concurrent backups in flight however you can only use to simultaneous backup destinations. VSS is supported on certain Windows platforms and application-level integrity is provided under certain conditions. Stores can be up to 1TB in size and you can use a maximum of 2 stores with each de-dupe appliance.

Conclusion

Both VCB and DR provide for backups of running vms however they do so under different circumstances. Most often you will use VCB with a 3rd party tool, and perhaps DR when you need fast restore capability for vms. A 3rd party tool gives you the flexibility of backup/restore outside of your immediate VMware environment while the VMware tools give you highly integrated ability within VMware.

On-line storage – safe or not?

Cloud computing has become a major buzzword this year and entails the provisioning of application and storage services within a distributed system operating on the Internet. Think Google Apps ( Wave, Docs, GMail, etc. ), T-Mobile SideKick, Amazon Elastic Compute Cloud ( EC2 )/Web Services, Salesforce, Bittorrent and many others. These can be put into the following categories:

  • peer-to-peer ( Skype, BOINC )
  • web applications ( Twitter, Facebook, MySpace, YouTube )
  • security services ( Messagelabs, Mimecast, Purewire, Scan Safe )
  • software as a service ( Google Apps, Zoho )
  • software + services ( MS On-line Services )
  • storage
    • content distribution ( Bittorrent, Akamai )
    • synchronisation ( Dropbox, Live Mesh )

The one big issue I have with these services is that you end up abdicating responsibility ( sort of ) for your data and services. There have been a number of high profile problems this year with cloud computing with 3 being discussed already in my blog this year ( Microsoft/Danger’s SideKick fiasco, Palm/Sprint’s webOS data loss and GMail’s numerous outages ).

So, do you depend on your service provider to provide you with invincible data integrity or do you take responsibility for your own data and services. Just because it’s someone else’s responsibility if you host with an on-line service, doesn’t mean it’s not an issue. The problem of data integrity and application availability is not solved with an on-line service, it’s simply moved elsewhere.

There are pros and cons each way, for running your own services and or using an Internet-based one. My personal view is that you can make use of on-line services however you also need to make provision for a failure of those services. It doesn’t matter who you are – failures happen and some of the largest companies in the world have had issues with both data and application services.

Get yourself one or two external USB Hard Disks, purchase ( or download ) a backup application and store your data on them. And do it regularly. That way you know you’ve done your duty and your data is ( generally ) safe. This way, someone else’s problem doesn’t become yours.

Intel’s SSD updates and firmware issues

It seems that Intel just can’t get it right when it comes to SSD firmware updates. First they introduced a subsector remap algorithm on the G1 which increased performance but this was shown to cause fragmentation and performance loss over time. This was fixed with a firmware update but a new G2 appeared with more improvements and a lockout issue involving BIOS passwords. This latest issue bricks G2 units that have been upgraded to the the newest firmware … this doesn’t bode well for confidence in SSDs. I’m sure it will get sorted out over time but Intel isn’t doing anyone any favours.

Virtualisation part 2: Storage

The first part of the series focussed on the OS layer of virtualisation. This second part will focus on storage in relation to server virtualisation.

Storage on its own, is a minefield of standards, specifications, technologies, protocols and incompatibilities. Add to this the concept of virtualisation and you’re looking at an area that’s difficult to manage let alone comprehend. The primary motivators when selecting storage are I/O performance, throughput and size vs. cost. But first one needs to understand the available choices so that you can make an informed decision. Let’s take a brief tour around storage technologies and then we’ll discuss how these align with virtualisation.

Core storage types:

1. DAS ( Direct Attached Storage ) – typically comprises either server internal disk or an external storage array directly connected to the server via SCSI, SAS or Fibre. This type of storage provides a block level protocol on top of which one installs standard RAID/logical drive/partition/file-system structures. Typical internal and external units will provide RAID along with drive sparing and automatic rebuild, hot-swap functions and good throughput. A 7 disk RAID using 1TB SATA disks will give you around 300MB/s of throughput and you can extrapolate performance from there. While Fibre is expensive, it provides the best interface throughput although SAS and UW SCSI will be fine for entry-level deployments. Security in this scenario is determined by the host OS using the storage device. Some DAS external units include multiple host controllers and ports allowing for link/controller fail-over and cluster setups.

2. NAS ( Network Attached Storage ) – designed to provide storage over a network using file-level protocols like CIFS, NFS and iSCSI. The device itself will run some form of standard or proprietary file-system with back-end RAID ( eg. Netapp with WAFL and Linux-based devices with ext3/4, reiserfs, xfs and jfs ) which in turn is provisioned using the network protocol. CIFS is often aligned closely with the Windows platform however most other OS’s are able to connect to CIFS shares. NFS is mostly used in Unix environments and can provide good throughput if correctly tuned at both the OS and NAS levels. iSCSI takes block storage at the device level and shares it over a network at which point it is seen as block level storage by a client using an iSCSI agent. A NAS device by default provides for very fine-grained file serving ie. you can carve up the physical disk using RAID and logical partitions, and then carve things up further by directory sharing. This is almost storage virtualisation in a way as one can make efficient usage of available physical space. Growing shared network file-systems is also easy to do with most ( if not all ) NAS devices by adding either physical space to the logical drives ( and growing the share ) or jsut adding additional logical spaces to the shared area. Security is provided at both the physical and shared levels providing granular although more complex capabilities. Performance can be aggregated by taking multiple GbE ports on the NAS device and creating bonds/teams ( OS/NAS level ) or aggregates ( Network level = Etherchannel or 802.3ad ). Maximum throughput is often limited by the network abilities unless you are using enterprise level devices which may have support for many network interfaces. Many enterprise-level devices offer additional features in respect of storage types with SAN Fibre connectivity ( using block level protocols like FC ) however, even entry level systems offer features like iSCSI ( block level protocol encapsulated over IP networks ), replication ( rsync ), backup, AD integration and uPNP functions. Note that there is a wild difference in performance between consumer NAS devices and enterprise systems.

3. SAN ( Storage Area Network ) – many incorrectly assume they have a SAN when using Fibre/Fibre Channel-connected storage. I would go so far as to say this is basically DAS even if you are using an FC switch. SANs typically involve multiple FC devices ( storage subsystems and other devices like tape libraries ), FC switches and FC-attached clients all on a separate Fibre Channel network which incorporates zoning functions within the FC fabric to control the flow of data between endpoints. SANs deliver high performance using link aggregation, high core switching throughputs and I/O rates and the FC family of protocols which were designed with storage in mind. These are block level protocols providing block devices to end points. While the core is designed specifically around the networking component of FCP ( this is changing with virtualisation being pulled into the core ), the storage endpoints provide enterprise features like block-level mirroring and synchronisation, snapshots, remote replication and storage virtualisation ( using in and out-of-band solutions ).

4. Storage Virtualisation ( don’t confuse this with storage used within server virtualisation ) – is provided within all storage systems ranging from simple RAID/Logical Drives to enterprise-level SV solutions built on SAN fundamentals. In the enterprise space, SV is taken to be a form of meta-data controller that manages a single or multiple storage devices and the clients that access those. SV is provided in both in-band ( where a controller is placed between/in-line with the storage and the clients ) and out-of-band ( the controller is part of the switching infrastructure and not in-line ) mechanisms. An example of in-band is FalconStor’s IPStor solutions which provides the whole gamut of storage virtualisation features including synchronous mirroring, de-duplication, CDP, replication and virtual carving of all backend disk subsystems. IBM’s SVC follows this route as well. For out-of-band solutions, StoreAge’s SVM product provides all the above features without the performance limit that is common with in-band solutions. SVM can aggregate multiple storage back-ends in parallel to provide massively scalable solutions to end-point clients, especially for streaming requirements. Most storage subsystem vendors also provide limited support for storage virtualisation within their own storage subsystems.

A quick note here to say that the base disk for VMs will always come from the host server itself which in turn gets its storage from the storage systems, but this doesn’t stop you from provisioning storage directly to a VM as non-boot disks eg. you can run an iSCSI initiator inside a VM to get block level disks from an iSCSI storage server that is separate from the storage systems used to provide base disks to the host servers. Or you could use NFS/CIFS to connect to shares on a NAS server …

File and Block protocols

There are 2 types of protocols when it comes to generating storage for clients.

File level protocols ( used in NAS devices ) provide shared volumes via a network mechanism ( eg. CIFS or NFS ) which inherently allows simultaneous multi-client access. These protocols include file-locking features to enable simultaneous file access without corruption. CIFS is the more recent name for Microsoft’s SMB connection-orientated protocol which has been in use since IBM invented the original PC. It has always been a moving target with regards to specifications and in its more recent SMB2 form, breaks compatibility with the former version. NFS has historically been used by Unix-based OS’s and has good connectionless support, can be used over UDP or TCP and provides full backwards compatibility from all later server versions ( 2, 3 and 4 ). Performance is reasonable with the option of bonding multiple 1GbE links together or you can use 10GbE. Security is granular ( if a bit complex ) with share and file-level features, while the shared nature of file protocols makes it a very flexible mechanism. By masking the nature of the back-end file-system, NAS devices allow clients to concentrate on the files themselves without having to worry about file-system semantics and management.

Block level protocols export block devices to clients that are seen as a local disk. These protocols include FCP and iSCSI ( amongst others ). FCP is a storage-specific protocol designed with high-throughput, low latency storage networking in mind. It runs over a fibre network ( in the core ) and has numerous features including zoning, multi-link aggregation with multi-path fail-over and core-based data movers. iSCSI encapsulates the SCSI storage protocol over IP providing a flexible and cost effective method of linking remote block devices to clients. Block protocols are by nature not shareable, hence the specific support for zoning in FCP fabrics and host group LUN mappings by all/most storage devices. Windows systems more specifically, will overwrite or corrupt data on LUNs connected to more than one client unless there is specific support in the OS for this ( eg. cluster quorum disks ). One will use cluster-aware file-systems ( eg. GFS, Lustre or others ) to provide simultaneous block device access to more than one client. SANs do NOT provide for data sharing – they only provide physical device sharing, something which is commonly misunderstood.

The rule is if you need shared data access, use a file-level protocol. For one-to-one high-performance data device access, use block-level protocols.

The virtualisation section

When deploying virtualisation, one needs to take into consideration the amount of virtual servers ( and VM’s ) per storage device as well as the type of loads that will be running. For loads that require low latencies, it’s important to look at storage that provides high I/O rates as well as non-network storage protocols – this will typically be DAS FC units or SAN-type storage with multi-path fail-over and aggregated links at the virtual server. For more moderate and general loads, one can use SAS/SCSI-connected DAS, entry-level SAN or even iSCSI/NFS solutions ( higher latencies and lower performance ). You can also decide between SATA, SAS and Fibre HDDs within the storage subsystems or use a combination of these for tiered-performance VMs.

Storage virtualisation solutions make these types of decisions a snap as they inherently implement the concept of storage performance tiering, making it simple to provision logical storage to a client from predefined performance pools ( eg. SATA disk for high-capacity, low performance requirement and FC disk for high-performance, lower capacity requirement ). Something as simple as individual HDD rotational speeds ( eg. 10K rpm vs 15K ) can also make a big difference in terms of latency and I/O rates. SV solutions also include some features specifically designed for server virtualisation like CDP of live VMs.

As a guideline, the following shows an increasing performance/cost list that you can tentatively base your decisions on:

  • NFS NAS
  • iSCSI NAS
  • accelerated iSCSI NAS
  • UW SCSI/SAS DAS
  • Fibre DAS
  • enterprise iSCSI NAS
  • FC SAN with SATA storage
  • FC SAN with SAS storage
  • FC SAN with Fibre storage
  • FC SAN with multiple virtualised storage subsystems

The availability/backup section

While backup and recovery is critical in any organisation, it is doubly so when running virtualised systems. The failure of a single virtual server host will bring down all the VM’s running on it resulting in far more issues than non-virtualised systems. The way to solve this is to make use of HA virtualisation solutions ( eg. VMWare HA, clustered RHEL 5.4 with KVM and GFS or XenServer HA ) – at the heart of all of these is centrally available storage ala SAN, multi-controller DAS or shared iSCSI/NFS.

VMWare’s VMFS is a cluster aware file-system that can span multiple host servers and be provisioned from any block-level source ( and now NFS too ). Red Hat’s GFS, although not specifically designed with virtualisation in mind, provides similar features.

Most backup software vendors provide agents/plugins to backup either VMWare VM contents ( standard file-system agent ), standalone VM’s or via VCB Proxy. For KVM and Xen, you can make use of standard file-system solutions, snapshot solutions at the OS level ( eg. LVM ) or storage-level snapshots. To go one step further, you can combine this with synchronous mirroring at the storage level to provide remote active HA hosts or passive standby hosts.

It’s critical to backup both the VM and it’s contents using your chosen methods – VM level for catastrophic VM failure or VM contents ( file level ) for file restoration. In addition, if you are using your virtualisation vendors’ storage functions, make sure you follow the recommended capacity planning w.r.t. storage – running out of space on a host file-system will likely take your VMs and host system down.

Fragmentation

Remember that with virtualisation, you will have file fragmentation at 2 levels now – the VM file as well as the guest OS. This can cause havoc with performance and it’s important if using Windows as a virt host to perform defragmentation regularly ( DiskKeeper has a good on-line tool ). Unix file-system based host virt solutions tend to fair much better in this regard but it’s still important to keep as much space open as possible on the vmFS so that it does not become an issue.

Conclusion

This is a complex topic which I’ve barely touched on in this article however I hope I’ve given you some ideas of where to start. Try and keep it simple and logical, and most of all keep the data safe.

Backups? We don’t need no stinking backups!

Here are 2 stories ( just in this week ) of tragic data loss or nearly …

My wife’s aunt phoned last night and indicated her daughter’s MacBook had been stolen while at a client. Simple: call the insurance, claim for a new laptop and restore your data. Not so fast. No backups … Considering this is a work machine with all her business-related info on it, it’s a serious loss.

Earlier this week, a client had run of out space on the VMFS of their virtualisation server while taking a snapshot of a VM ( their accounting server ). This caused the VM to refuse to start up again. The first issue here was the amount of free space on the VMFS was quite low ( 8% compared to the 50% recommended by VMWare ). Skimp on disk space and you end up in trouble. The second issue is there were no proper ( VM or client ) backups of the VM – lose the VM? Lose your data! This particular client doesn’t even have a backup solution in place ( despite continued calls to do so ). Luckily the VM was restored but this was a very close call.

I hear about ( and have to deal with ) data loss scenarios all the time and I still wonder sometimes why people wait until a critical point before doing anything about it. Surely prevention is better than cure? This is not even a cost issue as there are many solutions ( especially for non-corporates ) that are free. I use my Linux-based Synbak solution for many clients ( including those with large amounts of users and data ), and it provides a very secure and reliable way of backing up servers and desktops across a whole host of platforms. Depending on your data set size, you can use cheap USB-connected drives as the target, and take them off-site if needs be.

The moral of the story is you can never have enough backups of your data. If you think you are vulnerable, do something about it now before it’s too late. Take responsibility for your data!

Local insurance company loses client data

So it’s not just the Americans who are poor at client data security – the South Africans have got into the act as well. Local insurance firm Zurich SA said it had lost a tape containing client information. Apparently the backup tape was lost during a routing tape transfer to a data storage centre in August ( why are we only hearing about this now? ). KPMG is leading an investigation into the loss and expects to release a report at the end of November.

Data loss for Sidekick users Part 3

Internetnews.com mentioned on the 16th October that the bulk of the data had been recovered: “Microsoft today reports it recovered the majority of lost customer data for Sidekick owners amid a flurry of lawsuits filed yesterday over the recent server failure caused a service outage and data loss.” This is not quite the truth: no data has yet been recovered to users and it appears the problem may have been a failed firmware update to a storage subsystem which as a result caused database corruption. And so the saga drags on almost 2 weeks after Sidekick users were left without their information. Daniel Eran Dilger at RoughlyDrafted Magazine has written a biting piece on the saga – although he may be a bit hysterical about it all, the fact is that this problem happened under Microsoft’s watch and no amount of spin doctoring or news speak will allay the fact that Sidekick users don’t have their data.

Anyone worth their weight in IT gold knows to backup, to backup again, and back up again. When you’re finished there, go back up again. And for safety sake, just make another copy. Period!!! Systems fail, firmware updates fail, management requests that put your IT at risk fail, some gamma ray from the sun flips a bit. The point is that problems will happen – accept this and build IT systems that cater for them.

Virtualisation part 1: OS

I had an interesting question from a business colleague of mine today – please spell out what types of virtualisation are available. It’s good to know that, even though it’s fairly pervasive these days, people can still inquire as to what is the reality. Because running virtualisation for production loads can have a big effect on systems management, resource utilisation and performance. And it’s a damn site more complex – anyone who tells you otherwise is fooling themselves. Even Scott Lowe, whom many would consider to be one of the most knowledgeable in the field, would profess to learning all the time.

The reason is because virt touches every single part of your IT infrastructure from networking to storage to server. And a host of vendors’ equipment works with your virt in a host of different ways. From a systems management point of view, this either requires very good communication between all IT members, or someone who knows everything which is both dangerous ( single point of responsibility and failure ) and rare. Virtualisation does not automatically mean you’ll suddenly have a whole bunch of spare boxes in the corner. In addition, the ( apparent ) ease of rolling out virt means that VMs can now spread like bunnies and suddenly resource utilisation and performance go out the window.

So it’s a careful balancing act that needs to be managed very well to get the most benefit from. And although virt is now a household name, there’s in fact very few people who understand it from top to bottom. Consider the delicate balance between servers, networking and storage, where one anonymous setting can bring performance to its knees. Storage virt in itself is a complex topic; fold that in with OS and networking virt, and things go up a  notch to where it’s a bit of wizardry or art.

What to consider when using virt?

  • performance ( read I/O here )  requirements of different VMs across multiple host nodes
  • virtualise in the storage/network or is the the VM solutions’ provided facilities enough
  • storage virt: in or out of band meta-data controllers; host, network or storage-based
  • storage: FC, iSCSI or NFS? ALUA options from your vendor?
  • host nodes: fast CPU ( scale up ) or multi-core/socket ( scale out )
  • timing sensitive applications
  • type of virtualisation: full, para, jails or kernel core
  • SRM/virt management tools
  • networking: VM vendors’ options, bonding, multi-path or virt networking, jumbo frames, VLANs
  • backup: how do I provide for failure in a virtual environment

Most of this is an article for another day but here I’d like to cover the VM types available and how they compare to each other. Let’s look at the 4 main types of virtualisation:

  • Full virt means that the VM application virtualises the whole infrastructure of an x86 ( or other ) machine. Also know as binary translation ( although not exactly the same ), it provides virtualisation of any OS that would normally run on the host platform, without extra assistance or modification. A downside is that this method has a fairly high CPU overhead ( 15 – 25% ) as part of the translation process. Examples include VirtualBox, Parallels, Microsoft Virtual PC ( now included with Windows 7 ) and VMware Workstation/Server. Some now have support for CPU virt extensions which can speed things up a bit, especially for 64-bit environments.
  • Hybrid/Para-virt/hardware-assisted virt provides a software interface to guest systems that is similar to full virt but not exactly the same, using features of the host CPU to speed up certain VM functions.  The guest OS needs some modification or drivers for I/O need to be supplied. Typically this will provide performance within the 5 – 10% range of raw hardware however this depends on the guest OS setup. Xen and VMware ESX for example, can provide for both full and para-virt, however there is a performance penalty for running full virt under these ( as there are on other platforms ). Xen in para-virt mode also requires CPU vm extensions. Microsoft Hyper-V, and IBM’s PowerVM to a degree, follows the same path.
  • Native virt is what I like to call kernel-core virt, as there is special support built into the host OS itself which provides the guest direct access to all the hardware that the host OS has access to. The advantage of this is a broad spectrum of hardware support as well as close to raw performance for guest vm’s. One does require drivers for non-native guest OS’s though. Examples of this are KVM ( fairly new in the Linux landscape ) and IBM P-series/PowerVM. KVM in particular is being touted for timing sensitive apps due to it’s very good performance. As with para-virt, CPU vm extensions are required.
  • Operating system-level virt provides for isolated user-space instances of the host OS also called jails, containers, VPSs or VEs. There is little to no overhead here as the guests use the normal host OS’s systems calls to perform and no specific hardware support is required. A disadvantage of this method is that only copies of the host OS platform can be run – other OS’s are not supported. Examples include FreeVPS, Linux VServer, Solaris containers and OpenVZ. This is a favourite of ISP hosting companies for obvious reasons ( think 1000 containers on a single machine only running web servers ).

In case you’re wondering ( yes what happened to VMware ESX/vSphere? ), ESX straddles the line between full and para-virt, while Xen straddles the line between para and kernel virt. VMware ( also known as a binary translator ) for example, traps certain critical code instructions and emulates them using software. Binary translation can incur a large performance overhead in comparison to a virtual machine running on natively virtualized architectures – ESX minimises this to an extent using the VMware Tools suite which provides accelerated drivers for certain functions ( Network, Video ). Any OS can be emulated but performance is only a given on those supported.

As virtualisation is implemented, it’s important to realise that the underlying server components are of far greater importance from a reliability and availability point of view than non-virtualised systems. A single host server failure will lead to the downing of all VMs running on that system. There is no longer a one-one relationship between server hardware and and the platforms running on it, so choose wisely when implementing virtualisation and make use of the high availability solutions provided by the virt vendors if you need highly available systems.

That in a nutshell summarises the OS component of of enterprise virtualisation. It’s easy to use virt off the cuff however it takes a lot more effort, planning and design to get it working well in an enterprise arena.

If you are new to virt, take a look at something like Sun’s VirtualBox which has a very low entry barrier, and can give one a good idea of what to expect. If you are an experienced virt person, then you’re welcome to follow this series and either learn something or teach me something through the comments section.

Corrupted Intel SSDs?

Intel has indicated that there is the possibility of corruption on X25m and X18m SSD units related to setting a drive BIOS password. The problem is triggered if a user has set a password, changes or removes the password and then reboots the host computer. Shipments of these SSDs have been suspended however here in SA it’s likely that some have already purchased these. It’s recommended that you contact your local Intel supplier if you have purchased on of these units in the last few weeks.

Cluster and distributed filesystems

This is a surprisingly large ecosystem for an area of OS’s that would on the face of it seem very specialised. I think there are a few  facts that are bringing these features to the mainstream:

1. we’re looking at new IT innovations like cloud computing

2. management interfaces to provide these features are being developed and simplified

3. features have stabilised

Distributed filesystems include GlusterFS, Lustre, GFS2 and Hadoop while clustered filesystems include OCFS2, GFS2 and DRBD. Red Hat Enterprise includes GFS2 allowing clustered storage solutions through simple gui tools and DRBD is now available in the canned Openfiler storage solution ( not simple to setup but there is a good tut at howtoforge ).

GlusterFS uses a very interesting technique of aggregating storage bricks ( using items like iSCSI, DAS, AoE and SAN-supplied LUNs ) in mirrored or other RAID configurations providing redundancy for data across your storage environment. Setup does not involve kernel modules or changes so it’s fairly straightforward.

ZFS is also included in the latest FreeNAS rc snapshot meaning you can slice and dice your storage exactly how you want it with writable snapshots, storage pooling and simple administration.

The upside of all these storage applications means you can provide non-stop storage across an enterprise at a very low cost and simultaneously aggregate performance for those high-bandwidth apps. Virtualisation of storage, previously the domain of high-end commercial companies like Netapp, StorAge and Falconstor, is now available for little to no cost through a whole host of FOSS and commercial OSS software.

Backups, backups and more backups

Why is it that when it comes to backup, most people just draw a blank? Although it can be a complex topic, surely it’s not useful to stick your head in the sand. But that seems to be the trend – until something happens and data is lost. Consider that you are responsible for your own data, no one else, not even your IT support staff.

Take the time to identify all the different types of data that you work with on a daily basis and create a backup regimen that will include all of this data. Schedule your backups for a time when all your applications are closed as most backup applications can not deal with data that is open and in use. Get into the habit of checking that the backups have run and that there were no errors.

Windows Workstations are difficult to backup as a user’s data is spread through a number of locations within the users’ profile profile directory. Make sure that you’ve found the correct location for each application that is required for backup. Backing up the whole profile directory is also not always possible for 2 reasons:

  1. certain files ( eg. ntuser.dat ) can not be backed up using most backup applications
  2. there may be data on your machine outside of your profile directory

Lastly, make sure that you can recover from your data by testing it thoroughly. A backup is only as good as the restore you can perform.