On Fri, 26 Aug 2005, James Newton, Host wrote: > The piclist.com archive also has this email, complete with attachment at > http://www.piclist.com/techref/piclist/2005/08/26/155656a.txt Ok, > And by the way, that number at the end is the time the post left the mit.edu > mail server so it is in every header of ever copy of the email that is sent. > What that means is that you can always fined the archive copy from the email > based on that date and time. It is also NOT sequential... So you can't just > generate "old number"+1 and know that you will be able to rip the next > email. You have to get the listing for todays posts, then spider that list > to retrieve the posts... And then you might find a spider trap or... > twelve.... very nice, but ... >> (the article appears with the attachment, alas renamed to .bin, at: > > The PICList archive does not rename the attachment. It opens in adobe reader > with one click. The PICList archive DOES, however, put the text of your > email in a separate text file because that is how the email was actually > formatted. > >> > > How did you get from your email to that "66893" number? I guess you browsed > today's posts on gmain? By clicking on the direct link that appears at the bottom of the page when looking at an article. The link is a permalink (it leads back to that article every time), and incrementing it works, but does not necessarily yield articles from the piclist. This is not about ripping, it's about access. Lars has reacted to my email and the attachment is still .bin but its name is .pdf as it should be. The .bin suffix prevents automatic opening of possibly virused attachments by clueless users. >> (I will email Lars@gmane about the attachment suffix) > > Lars is a good guy. I wonder if he gets as few donations for the support of > his copy of the PICList as I do? > > I also sort of wonder how he pays his hosting bill... It took me about 5 > minutes to put together an index page that referenced all of today's posts > and then rip it with wget. His server is FAST. Which is amazing when you > consider how many posts he is archiving between all those email lists. I was > amazed at how wget could just request one page after another and they came > up really quick. You could get a LOT of posts that way in a short time. I > kept expecting some sort of rate limiting system to kick in, but nothing > did. His thread list comes up a little slower, but you don't need that to > rip. Read the FAQ at gmane. > Then it took a few seconds with "Search and Replace for Windows" from > http://www.funduc.com to change all the "",s to "@"'s and then I ran my > Perl script to extract emails (which I have 'cause I study how people rip > emails in order to stop them doing it) and that gave me a nice list of email > addresses from the people who posted today. Again, read the FAQ. Whoever subscribed the piclist had the option to request encrypted emails but did not. > Try that with the piclist.com archive.... Go on. Try it! Tell me what you > find. If you do manage to rip some emails, let me know in private so I can > patch the hole? But I don't think you will get many, I've done a lot to > secure it. Of course, all the work I did to secure it is pretty much useless > when anyone can subscribe to the list and host an archive with no security. Your fortress has only one wall, with a strong gate in it ? So all that is needed is a walk around it (to another site) ? ;-) Remember when the lion is after you, you do not need to break the world record, you only need to outrun your friends. Peter -- http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist