That's very interesting! I wonder, though, if you could re-run it without any quoted emails? I suspect that if you ran it against the email as-is, you'd get 5 of any particular phrase in a thread just due to quoting the same email in several other email. Would be fun to run some statistical analysis against the whole piclist archive. I'd like to get a google trends like interface so I can see the rise and fall of each PIC type, particular problems, and other topics... -Adam On 4/14/08, Martin wrote: > I've been indexing piclist emails for the past few weeks. Here are the > top 100 most popular phrases according to my scoring algorithm. It's > "sort of" interesting. I don't know what I expected. Maybe when there > are several thousand emails in the database it will look more > interesting. A "phrase" here is defined as three unlikely words placed > near each other. Often-occurring words such as "and the I is" etc. are > not included for obvious reasons. The highest scoring phrases are mostly > gibberish like: "psychotherapists psychopaths homeopaths" and "cpi ctaf > displayitem" or "zhmezhme hansjurgen fajs". -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist