Can you save it as a text file, and from that, print it ? ? CR ----- Original Message -----=20 From: "Dr Skip" To: "Microcontroller discussion list - Public." Sent: Thursday, September 23, 2010 10:15 AM Subject: [EE] eng manuals from web pages to single file > Greetings all, > > I've got a task that needs automation, but I have yet to find the answer= =20 > (and I > don't have any interns to help me in the form of forced labor ;) > > I have to convert sets of html files that are manuals into something=20 > printable > or readable on an ebook reader (kindle for instance). These pages (html=20 > pages) > are in the hundreds, and typically in a simple tree with a contents page > linking to chapter pages linking to pages. Some have next> functionality, > but that doesn't change things. Readers like the kindle don't like web > hierarchies or local (stored on the reader) html - they like pdf, text,=20 > mobi, > etc files. They were created a while ago. Rewriting is not an option, and= =20 > I > have to do quite a few like this. > > The 'manual' way would be individually printing each page perhaps to a=20 > pdf, and > merging. Even with drag-drop and a few clicks each it could drive one to > insanity! Canon once had an app that could take a web page, fetch all the= =20 > links > on the page, and cat them in order into one doc. I can't find anything=20 > like > that now, but that's the idea! > > I can put the pages on a server and mirror them locally for a tree, or I= =20 > can > mirror them and put in one directory (spider tool doing the link=20 > translation). > Either way, Cat'ing html files doesn't work, obviously. I just need a way= =20 > to > get the text of the pages into one file, in original link order and=20 > without > garbling paragraph (and heading) format (nothing fancy, just paragraphs=20 > should > still be paragraphs and headings should be lines of their own). No=20 > graphics at > this point, but I may need that later, but either way is good. > > Just in case I'm not being clear, the tool would have to take a web page= =20 > (url > or html file) and output at least the text to a file, followed by the tex= t=20 > of > the file in the first link on the page, followed by the text of the next= =20 > link, > and so on. A page break between each might be nice too. ;) > > Any ideas? Write to me off list if this is a bore to folks. However, I=20 > think it > might be useful to a lot of us who deal with engineering documentation an= d=20 > I > don't know why someone hasn't come up with something or if they have, why= =20 > it's > so hard to find. Hopefully someone knows where one is and I'm just not=20 > looking > in the right places. Google has not been my friend... > > Thanks in advance, > Skip > > > --=20 > http://www.piclist.com PIC/SX FAQ & list archive > View/change your membership options at > http://mailman.mit.edu/mailman/listinfo/piclist >=20 --=20 http://www.piclist.com PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist .