> In contrast, when ZFS encounters a block read error, it
> retries from a redundant copy anywhere it exists (maybe on the same devic=
e), and
> immediately re-writes the offending block. For most drives, this fixes it=
.. The block
> is added to the internal bad list, and the re-write causes an immediate r=
eallocation
> from the spares area.  The original device is not taken offline, and the =
replacement
> block can come from the very same device (by default all metadata blocks =
are
> allocated in two places, so the filesystem structure can survive signific=
ant damage
> even on a single drive).

I would be worried if the RAID controller wasn't doing this to the drive if=
 it detected an error. AFAIK all modern high speed drives have the necessar=
y on-board intelligence to alternate a faulty sector, and a RAID controller=
 should be making sure the drive does it, and then reporting to the OS that=
 an error has happened and been cleared.

If you have seen instances where an error on a RAID drive has caused a RAID=
 set to fall over then I suspect the controller has been doing exactly what=
 you describe ZFS as doing, but the OS probably has no means of logging any=
 errors reported by the RAID controller as sectors have been repaired, or e=
lse no-one has been doing regular audits of logs to see if things are getti=
ng soft errors and changing a drive before it falls over in a big way.
--=20
Scanned by iCritical.

--=20
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.