On 2012-01-04 12:28, alan.b.pearce@stfc.ac.uk wrote: >> In contrast, when ZFS encounters a block read error, it >> retries from a redundant copy anywhere it exists (maybe on the same devi= ce), and >> immediately re-writes the offending block. For most drives, this fixes i= t. The block >> is added to the internal bad list, and the re-write causes an immediate = reallocation >> from the spares area. The original device is not taken offline, and the= replacement >> block can come from the very same device (by default all metadata blocks= are >> allocated in two places, so the filesystem structure can survive signifi= cant damage >> even on a single drive). > I would be worried if the RAID controller wasn't doing this to the drive = if it detected an error. AFAIK all modern high speed drives have the necess= ary on-board intelligence to alternate a faulty sector, and a RAID controll= er should be making sure the drive does it, and then reporting to the OS th= at an error has happened and been cleared. > > If you have seen instances where an error on a RAID drive has caused a RA= ID set to fall over then I suspect the controller has been doing exactly wh= at you describe ZFS as doing, but the OS probably has no means of logging a= ny errors reported by the RAID controller as sectors have been repaired, or= else no-one has been doing regular audits of logs to see if things are get= ting soft errors and changing a drive before it falls over in a big way. Rewriting single blocks may be a feature of more modern or well-designed=20 controllers. I am not that knowledgeable about their feature sets. The=20 background "patrol read" also helps, because the drive might reallocate=20 marginal sectors with correctable read errors when it encounters them. I have personally, definitely had RAID5 rebuilds fail because the=20 rebuild turned up more errors in the other devices. (probably in files=20 that hadn't been read in years, or even unallocated space!) An array might report it to its own control/monitoring software, but=20 that's outside the purview of the filesystem's operation. One thing is for sure, it doesn't normally report it to the filesystem=20 layer. That's the whole point, to shield the filesystem from IO layer=20 errors. Most filesystems DO NOT handle IO error by recovering, but you=20 can design one that does. The solution is to eliminate the=20 painfully-maintained abstraction of a single contiguous 100% error-free=20 block device and move the block IO error handling into the filesystem,=20 and ZFS has been designed in that way. I think there are even videos of=20 some engineers putting a zpool on a large number of USB drives, and=20 randomly destroying, replacing, and shuffling the drives while the=20 filesystem dealt with it in real-time and continued operating. Joe --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .