> >> >>> Harold Hallikainen wrote: >>>>>> I've already written a UTF-8 to 16 bit Unicode converter for >>>>>> another project, so I guess I'll take that and drive an if to pass >>>>>> through >>>>>> ASCII and a switch case to handle CP1252 above 0x7f. A lot of the >>>>>> codes >>>>>> line up, but several don't. >>>>>> >>>>>> >>>>> Exactly 32 don't and they are all in one continuous block (0x80 to >>>>> 0x9F) >>>>> so a small table for that block might be another option to consider. >>>>> >>>> Excellent! >>>> >>> Unfortunately i've just realised I was wrong, while the exceptions are >>> in one contiguous block in windows-1252 they are spread all over the >>> place in unicode so my advice would have been good for a 1252 to >>> unicode >>> converter but not for a unicode to 1252 converter. >>> >> >> OK, I'm at home and all that is at work... I guess I'll go back to my >> switch case. > > Erk. Rather sub-optimal, since unless the compiler is very clever, it will > be running 32 tests per character. > > You can still use the table approach: Build your table of 32 entries, > where each entry lists the unicode AND ISO code, and the list is sorted by > unicode. > > Then do a 'phone book' Log(n) search (use the middle value to determine if > you recurse for the top or bottom block) to find if the unicode is in the > list. If so, then replace with the associated ISO code. > > Only takes about 5 tests per character, and is super-quick if you unroll > the loop. > > Of course, you could hand-optimize the case statement as a tree of nested > IFs that act like the 20-questions game (although in this case, you should > only need five questions.) which is really a pre-optimized version of the > phone book search. > > And if all this seems overcomplicated, speeding up string processing by 6 > times is usually worth it. > > -- > Jeremy Lee BCompSci (Hons) > The Unorthodox Engineers > www.unorthodox.com.au Thanks! I did something like that (binary search) to look up the Unifont bit map for a Unicode character when I was storing part of the Unifont table in internal flash. I've since moved it to an SPI flash, so I just index directly to the bitmap. Anyway, I'll look at this all this week. Thanks for the comments! Harold FCC Rules Updated Daily at http://www.hallikainen.com - Advertising opportunities available! -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist