I did some messing with the postbot.asp archive viewer program today:
http://www.piclist.com/techref/postbot.asp

Basically, I wanted to avoid the problem of emails with QP encoding or no
line wrap being hard to read.

First, I removed the <PRE> tag around the content of the email and instead:
	Translated all occurrences of "  " to " &nbsp;"
	Translated all tabs to " &nbsp; &nbsp; &nbsp; &nbsp;"
	Added a <BR> before all carriage returns
	Wrapped the post in <TT> to get the non-proportional spacing

Except when the encoding is quoted-printable. In that case I:
	replace chr(13)+chr(13)+chr(10) with chr(13)+chr(10)
	replace "=20"+chr(13)+chr(10)+">" with " "+chr(13)+">"
	replace "="+chr(13)+chr(10) with " "
	replace "=20"+chr(13)+chr(10) with " "
	decode the QP (see below)
	replace chr(13) with "<BR>"
	wrap it in <TT>

When decoding QP, I've added some support for UTF-8 encoding translation to
the &#n; characters.

It also manages Latin 1 ok since that seems to be a subset of UTF-8 where
there is only one byte per character.

I have not added windows-1252 because I can't see doing a database lookup
for each bloody character. Is there a "cute" trick for translating 1252 to
Unicode without a table?

Mostly, I'm posting this because I have no clue on this stuff. I've set up a
page at
http://www.massmind.org/techref/language/html/charset.htm and I would
appreciate any corrections.

Can you guys play with the archive reader and see if it is still working ok?

---
James Newton: PICList webmaster/Admin
mailto:jamesnewton@piclist.com  1-619-652-0593 phone
http://www.piclist.com/member/JMN-EFP-786
PIC/PICList FAQ: http://www.piclist.com



_______________________________________________
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist