> > Naturally, it's not hard to spot the Google spider, and 
> give it access 
> > to an abridged PDF (ya don't want the lot ending up in the 
> cache!).  AKA 'doorway'
> > pages.  Google takes a dim view of this, and banishes those it 
> > catches.  I'm not sure how it catches them though, a second 
> spider disguised as a browser?
> 
> It really must be.
> 
> Search for any of my pages, and a decent number will have 
> weird crap in the google results from my "put up the 
> webservers logs" background.
> 
> For awhile I had some code that would detect the google 
> spider and simply disable that stuff. But I noticed that 
> every new page I put up would work... then about a week or 
> two later that log crap would show up again.
> 
> Google definetely has second spiders.


Why don't you use robots.txt like you're supposed to?

That's exactly the sort of thing that gets you kicked out of Google.
Serving up a different result to the Google spider than what a browser would
see means you're trying to rig the system.  Browsers see spam, spider sees
keywords.  Tsk, naughty!

Anyway, I doubt the spider runs Javascript, so it may not have even noticed
unless you were doing it server-side.

Tony

-- 
http://www.piclist.com PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist