> >> Harold Hallikainen wrote: >>>>> I've already written a UTF-8 to 16 bit Unicode converter for >>>>> another project, so I guess I'll take that and drive an if to pass >>>>> through >>>>> ASCII and a switch case to handle CP1252 above 0x7f. A lot of the >>>>> codes >>>>> line up, but several don't. >>>>> >>>>> >>>> Exactly 32 don't and they are all in one continuous block (0x80 to >>>> 0x9F) >>>> so a small table for that block might be another option to consider. >>>> >>> Excellent! >>> >> Unfortunately i've just realised I was wrong, while the exceptions are >> in one contiguous block in windows-1252 they are spread all over the >> place in unicode so my advice would have been good for a 1252 to unicode >> converter but not for a unicode to 1252 converter. >> > > OK, I'm at home and all that is at work... I guess I'll go back to my > switch case. Erk. Rather sub-optimal, since unless the compiler is very clever, it will be running 32 tests per character. You can still use the table approach: Build your table of 32 entries, where each entry lists the unicode AND ISO code, and the list is sorted by unicode. Then do a 'phone book' Log(n) search (use the middle value to determine if you recurse for the top or bottom block) to find if the unicode is in the list. If so, then replace with the associated ISO code. Only takes about 5 tests per character, and is super-quick if you unroll the loop. Of course, you could hand-optimize the case statement as a tree of nested IFs that act like the 20-questions game (although in this case, you should only need five questions.) which is really a pre-optimized version of the phone book search. And if all this seems overcomplicated, speeding up string processing by 6 times is usually worth it. -- Jeremy Lee BCompSci (Hons) The Unorthodox Engineers www.unorthodox.com.au -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist