>
>> Harold Hallikainen wrote:
>>>>> I've already written a UTF-8 to 16 bit Unicode converter for
>>>>> another project, so I guess I'll take that and drive an if to pass
>>>>> through
>>>>> ASCII and a switch case to handle CP1252 above 0x7f. A lot of the
>>>>> codes
>>>>> line up, but several don't.
>>>>>
>>>>>
>>>> Exactly 32 don't and they are all in one continuous block (0x80 to
>>>> 0x9F)
>>>> so a small table for that block might be another option to consider.
>>>>
>>> Excellent!
>>>
>> Unfortunately i've just realised I was wrong, while the exceptions are
>> in one contiguous block in windows-1252 they are spread all over the
>> place in unicode so my advice would have been good for a 1252 to unicode
>> converter but not for a unicode to 1252 converter.
>>
>
> OK, I'm at home and all that is at work... I guess I'll go back to my
> switch case.

Erk. Rather sub-optimal, since unless the compiler is very clever, it will
be running 32 tests per character.

You can still use the table approach: Build your table of 32 entries,
where each entry lists the unicode AND ISO code, and the list is sorted by
unicode.

Then do a 'phone book' Log(n) search (use the middle value to determine if
you recurse for the top or bottom block) to find if the unicode is in the
list. If so, then replace with the associated ISO code.

Only takes about 5 tests per character, and is super-quick if you unroll
the loop.

Of course, you could hand-optimize the case statement as a tree of nested
IFs that act like the 20-questions game (although in this case, you should
only need five questions.) which is really a pre-optimized version of the
phone book search.

And if all this seems overcomplicated, speeding up string processing by 6
times is usually worth it.

-- 
Jeremy Lee BCompSci (Hons)
 The Unorthodox Engineers
  www.unorthodox.com.au

-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist