On 2012-01-02 21:45, lee@frumble.claremont.edu wrote: > I would configure the drives in a RAID-5 array with 3 for data, 1 for=20 > parity, and 1 hot spare drive. I'd also ensure that the system hosting=20 > the controller card and the drive enclosure were on a UPS and,=20 > preferrably, in a temperature controlled room. The chances of you not being able to recover the RAID after a single=20 drive failure are too high. You are likely to find an uncorrectable=20 error elsewhere when you go to rebuild the lost array member. These=20 consumer drives have uncorrectable read error rates of 1 in 10^14 bits=20 .... and they hold 10^13 bits of data. That is not enough of a margin=20 for me! > > If you have the budget, you could build out dual RAID-5 setups > and either mirror them via RAID-1 or layer ZFS on top of both. > [ZFS sounds quite nice; thanks to whoever for mentioning it.] There are few reasons to layer ZFS on top of hardware RAID, and many=20 reasons not to. Its correction and repair mechanisms are much stronger=20 than RAID-5, not to mention resolving "read-write-hole" and stripe write=20 performance issues via variable stripe size. Additionally, ZFS can and=20 will keep all those drives busy with intelligent IO reordering that it=20 can only do reliably if there is not a RAID controller underneath=20 "lying" about what spindles exist and what data has made it to disk! In my experience, a RAID controller will offline a device on the first=20 encountered read error. At that point, the entire device is inconsistent=20 and invalid (as more writes occur on the other members), and requires a=20 simultaneous reliable read of the entire surface of all the other disks=20 to "rebuild". A strenuous task to say the least, happening at "just the=20 wrong time". In contrast, when ZFS encounters a block read error, it=20 retries from a redundant copy anywhere it exists (maybe on the same=20 device), and immediately re-writes the offending block. For most drives,=20 this fixes it. The block is added to the internal bad list, and the=20 re-write causes an immediate reallocation from the spares area. The=20 original device is not taken offline, and the replacement block can come=20 from the very same device (by default all metadata blocks are allocated=20 in two places, so the filesystem structure can survive significant=20 damage even on a single drive). Frankly, it's odd to me. The whole idea that you need a complicated=20 "super reliable" controller, with its own OS, firmware, and=20 battery-backed RAM no less, to make sure the storage system remains=20 consistent in the face of a crash, with no input or advice from the OS=20 or filesystem layer, so you can present the "illusion" of a single dumb=20 block device... It seems so brittle. ZFS is fantastic. A backup strategy feasible under ZFS is a 3-way=20 mirror, with a rotating member that you pull and take to the vault. ZFS=20 will "resilver" the out-of-date device by copying only the data needed=20 to bring it up to date since the last recorded transaction group on the=20 device. If you do this with normal RAID and a 3TB drive, you'll tie up=20 your storage system for 12-16 hours while ~50MB/s moves from disk1 to=20 disk 2. Additionally, ZFS can "stream" the difference between two=20 snapshots, so you can have a master taking snapshots every minute or=20 every month, and "sending" the differences to a slave, who "receives"=20 them into its own filesystem. Just like log shipping of a database.=20 Since ZFS supports countless filesystems per storage pool, you can=20 create one per user, one per installed app, one per database instance,=20 etc... to make this kind of backup and data management easier. I have used ZFS for years on my FreeBSD servers, and I have to say that=20 it's very comforting to snapshot the entire system before doing=20 something like an OS or database upgrade. I trust the reliability=20 mechanisms, so that's all the backup I need before proceeding to wipe=20 out the system. And I can recover by changing one line in the boot=20 loader config file to mount the snapshot as root. I have an SATA=20 enclosure and buy drives in pairs from different manufacturers, and just=20 add them to the pool as a mirror pair. I.e., "zpool add pool0 mirror=20 /dev/ada3 /dev/ada4" is all that's required to grow the pool, and the=20 new space is available to every FS. I enable ZFS compression, so I=20 don't feel that I must "squeeze" more space out with parity RAID. It's also comforting that I can take these disks and plug them directly=20 into another FreeBSD, Solaris, Linux, or MacOS machine, and get my=20 data. ZFS is freely available on those OSes, and they are just standard=20 disks with standard GPT partitions. I have lost important data in the=20 past because a new controller could not be found to replace a failed=20 controller, or the replacement machine overwrote the very RAID array it=20 was supposed to recover because it didn't "know" there was an array on=20 those disks. With ZFS I don't have to match controller model,=20 controller firmware version, OS driver, or OS version. ZFS itself is=20 versioned and upgradeable in place, so moving to a newer OS is graceful. Joe --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .