Storage Advisors

Write-back cache: Battery vs Disk

Thursday 28th June 2007 - 00:00

Storage Advisors

Question to the Storage Advisors, from anonymous: Which is better: (a) backup battery for cache as found on OEM RAID controllers or (b) writing cache content to one or more disk drives?

Good question. For those not verse in the dark arts of cache write-back strategies, we?re talking about methods for protecting user data that has been written by the OS but hasn?t actually made it to the media yet. It?s common for a disk controller to improve write performance by accepting the data from the OS and saying that it?s been written to disk, when in reality it?s still in memory (or disk, as suggested by the poster?s question). This technique is referred to as ?write-back? because the data is written in the background. The opposite of write-back is ?write-through? where the controller really does write the data to disk before telling the OS that it?s finished.

[Note that controllers aren?t the only things that have a write-back cache - the OS and drives also have one. But to avoid complicating a somewhat simple question, let?s just ignore those other caches for now. The OS has ways of protecting itself, and drive caches should be disabled if a write-back controller is being used.]

It?s very important that this un-written controller data is protected because it?s the only copy of that data. The OS thinks that the data is written to disk and therefore purges it from memory or wherever it came from. If a power failure occurs before this write-back data is written to disk, then it?s permanently lost. With large caches we?re talking about 100?s of MBs of data. And it can even be worse than that because the missing writes could be to a file structure or database, resulting in massive corruption and loss of files that aren?t even being accessed. It can be a real mess. And the user won?t know about it until they read back corrupted data ? which isn?t always obvious.

So, on to the question: What?s the best way to protect this unwritten data? The most common approach is to simply put a battery on the disk controller. If power is lost to the system, including to the drives, the controller memory will transition to battery-backed mode and preserve any write-back data that hadn?t made it to disk yet. The battery is typically selected to provide at least 72 hours of backup time ? protecting data across a weekend.

An alternative method, as suggested by the poster, is to save this write-back data to disk. There are different ways to implement this, but the most common is to ?simply? store the data in a transaction log on the disk. Now, note that the data is typically stored on the disk (in either method ? battery or log) using some form of RAID, protecting against data loss due to a drive failure. RAID-5 is a pretty commonly selected RAID level, but has very poor random write performance ? a problem which just happens to be greatly alleviated by some form of write-back cache. So for this example, let?s assume that RAID is being used. This means that the write-back data being logged to disk should also be protected from disk failure. The easiest way to do this is to simply write the log file to two disks. (Some users prefer RAID-6 which protects against two drive failures, in which case the transaction log should be written to three disks!)

OK, now let?s look at the pros and cons of the controller-based battery and disk-based log approaches.

Backup Protection Time: A battery has a limited storage time ? around 72 hours as previously pointed out. However a transaction log on disk can last almost indefinitely, i.e., the lifetime of the drive. So here the advantage clearly goes to disk-based logs. (BTW, some folks are looking at ways to automatically move the controller cache data to a more permanent storage device, like CompactFlash, allowing controller backup times similar to transaction log backup times. So this will eventually become moot.)

Life Expectancy: Another issue with batteries is that they don?t last forever. They eventually degrade and fail, lasting maybe a few years before they need to be replaced. Drives obviously don?t have this issue.

Capacity: This one is really a nit, but I figured I?d list it to be complete. If a controller has 256MB of memory, for example, then the transaction log will require 2×256MB of disk space, or 512MB. With 1TB drives, this one is a big fat ?don?t care?.

Cost: Batteries and the associated circuitry probably add about $100 to the user-cost of a controller, while 512MB of disk space for the log is practically free. $100 might be a big deal for a home user (who probably doesn?t need RAID or write-back cache anyway), but it?s just another nit for serious IT folks. Once you add up the price of the motherboard, OS, drives, etc., $100 is in the noise.

Performance: So far the advantage has clearly gone to the disk log, but performance is probably the most important factor when choosing a cache backup protection method. With a battery-backed controller there are no additional steps to protect the data in cache. ?It just works.? Of course there is a lot of magic in the hardware design to make it ?just work?, but that has no effect on the performance. On the other hand, with disk-based logs the data has to be written to two different disks. This will probably entail two seeks, assuming that the drives had been servicing requests in some other section of the media. And eventually, that logged data will have to be read back from disk and moved to the permanent location ? causing two more seeks and reads. So now a single OS write will cause four additional IOs to the disks.

So how the heck do we figure out the performance hit due to these four IOs? Let?s try this crude method:

Assume that RAID-5 is being used. Therefore each random OS write will cause four disk IOs - two reads and two writes. With disk-based logging there are four additional IOs to log and ?un-log? the data for a total of eight IOs. Using this approach we can see that disk logging has twice as many IOs as controller battery-protected cache, therefore you get about a 2X difference in performance. Of course real performance modeling will be more complex than this, so just squint at the numbers and figure that the difference is anywhere from 50% to 150%. That?s a big dang difference.

The bottom line is that most users that are concerned with performance aren?t concerned with saving $100, therefore battery-backed cache is clearly the winner.

Enjoy,
TT

More about: Storage , Interconnects , & , RAID
If you found this article interesting, please consider subscribing to our RSS feed, or becoming a member of biz-news to have future articles delivered to your feed reader or mailbox
Advertise with us and reach to an audience of thousands of High Tech professionals
Comments
Your Name *
Your Email *
Your email will not be disclosed anywhere
Antispam Control


Latest News