> In contrast, when ZFS encounters a block read error, it > retries from a redundant copy anywhere it exists (maybe on the same devic= e), and > immediately re-writes the offending block. For most drives, this fixes it= .. The block > is added to the internal bad list, and the re-write causes an immediate r= eallocation > from the spares area. The original device is not taken offline, and the = replacement > block can come from the very same device (by default all metadata blocks = are > allocated in two places, so the filesystem structure can survive signific= ant damage > even on a single drive). I would be worried if the RAID controller wasn't doing this to the drive if= it detected an error. AFAIK all modern high speed drives have the necessar= y on-board intelligence to alternate a faulty sector, and a RAID controller= should be making sure the drive does it, and then reporting to the OS that= an error has happened and been cleared. If you have seen instances where an error on a RAID drive has caused a RAID= set to fall over then I suspect the controller has been doing exactly what= you describe ZFS as doing, but the OS probably has no means of logging any= errors reported by the RAID controller as sectors have been repaired, or e= lse no-one has been doing regular audits of logs to see if things are getti= ng soft errors and changing a drive before it falls over in a big way. --=20 Scanned by iCritical. --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .