Adobe Acrobat does exactly that. It will produce a multipage PDF from a web page, following up on links up to the level you specify, IIRC you can also limit to the same site or multi-site. 2010/9/23 Dr Skip > Greetings all, > > I've got a task that needs automation, but I have yet to find the answer > (and I > don't have any interns to help me in the form of forced labor ;) > > I have to convert sets of html files that are manuals into something > printable > or readable on an ebook reader (kindle for instance). These pages (html > pages) > are in the hundreds, and typically in a simple tree with a contents page > linking to chapter pages linking to pages. Some have next> functionality, > but that doesn't change things. Readers like the kindle don't like web > hierarchies or local (stored on the reader) html - they like pdf, text, > mobi, > etc files. They were created a while ago. Rewriting is not an option, and > I > have to do quite a few like this. > > The 'manual' way would be individually printing each page perhaps to a pd= f, > and > merging. Even with drag-drop and a few clicks each it could drive one to > insanity! Canon once had an app that could take a web page, fetch all the > links > on the page, and cat them in order into one doc. I can't find anything li= ke > that now, but that's the idea! > > I can put the pages on a server and mirror them locally for a tree, or I > can > mirror them and put in one directory (spider tool doing the link > translation). > Either way, Cat'ing html files doesn't work, obviously. I just need a way > to > get the text of the pages into one file, in original link order and witho= ut > garbling paragraph (and heading) format (nothing fancy, just paragraphs > should > still be paragraphs and headings should be lines of their own). No graphi= cs > at > this point, but I may need that later, but either way is good. > > Just in case I'm not being clear, the tool would have to take a web page > (url > or html file) and output at least the text to a file, followed by the tex= t > of > the file in the first link on the page, followed by the text of the next > link, > and so on. A page break between each might be nice too. ;) > > Any ideas? Write to me off list if this is a bore to folks. However, I > think it > might be useful to a lot of us who deal with engineering documentation an= d > I > don't know why someone hasn't come up with something or if they have, why > it's > so hard to find. Hopefully someone knows where one is and I'm just not > looking > in the right places. Google has not been my friend... > > Thanks in advance, > Skip > > > -- > http://www.piclist.com PIC/SX FAQ & list archive > View/change your membership options at > http://mailman.mit.edu/mailman/listinfo/piclist > --=20 Ariel Rocholl --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .