On 2012-01-04 12:28, alan.b.pearce@stfc.ac.uk wrote:
>> In contrast, when ZFS encounters a block read error, it
>> retries from a redundant copy anywhere it exists (maybe on the same devi=
ce), and
>> immediately re-writes the offending block. For most drives, this fixes i=
t. The block
>> is added to the internal bad list, and the re-write causes an immediate =
reallocation
>> from the spares area.  The original device is not taken offline, and the=
 replacement
>> block can come from the very same device (by default all metadata blocks=
 are
>> allocated in two places, so the filesystem structure can survive signifi=
cant damage
>> even on a single drive).
> I would be worried if the RAID controller wasn't doing this to the drive =
if it detected an error. AFAIK all modern high speed drives have the necess=
ary on-board intelligence to alternate a faulty sector, and a RAID controll=
er should be making sure the drive does it, and then reporting to the OS th=
at an error has happened and been cleared.
>
> If you have seen instances where an error on a RAID drive has caused a RA=
ID set to fall over then I suspect the controller has been doing exactly wh=
at you describe ZFS as doing, but the OS probably has no means of logging a=
ny errors reported by the RAID controller as sectors have been repaired, or=
 else no-one has been doing regular audits of logs to see if things are get=
ting soft errors and changing a drive before it falls over in a big way.


Rewriting single blocks may be a feature of more modern or well-designed=20
controllers.  I am not that knowledgeable about their feature sets.  The=20
background "patrol read" also helps, because the drive might reallocate=20
marginal sectors with correctable read errors when it encounters them.

I have personally, definitely had RAID5 rebuilds fail because the=20
rebuild turned up more errors in the other devices. (probably in files=20
that hadn't been read in years, or even unallocated space!)

An array might report it to its own control/monitoring software, but=20
that's outside the purview of the filesystem's operation.

One thing is for sure, it doesn't normally report it to the filesystem=20
layer. That's the whole point, to shield the filesystem from IO layer=20
errors. Most filesystems DO NOT handle IO error by recovering, but you=20
can design one that does. The solution is to eliminate the=20
painfully-maintained abstraction of a single contiguous 100% error-free=20
block device and move the block IO error handling into the filesystem,=20
and ZFS has been designed in that way.  I think there are even videos of=20
some engineers putting a zpool on a large number of USB drives, and=20
randomly destroying, replacing, and shuffling the drives while the=20
filesystem dealt with it in real-time and continued operating.

Joe


--=20
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
.