I know this may be a tall order. I was unable to find a script online that did this. The script that comes with Dreamweaver is supposed to be able to be used outside of DW, but I couldn't impliment it.
Basically we have content managers that edit their content in MS Word versions 97 - XP. The content can be copied and pasted into the htmlArea, but when you view HTML you get... well, you get what the Office developers try to pass off as HTML.
So, basically I'm looking for a command I can run against the text input that strips out all that MS Word Junk. Can this be added to htmlArea? And/Or anyone know of a cleanWordHTML(text); function?
Re: [Goudinov] Feature request - MS Word Cleanup
[In reply to]
Can't Post
OK here is what I have. It seems to work fairly well. The only problem is that when you paste the MS Word content there is no overall wrapper tag like BODY or something. So for this to work you have to know all the parent tags. For now I'm just checking for P, OL, & UL. I am also removing all SPAN tags. I'm doing this because there doesn't seem to be an overall TAGS collection, you have to provide the sTag parameter for document.all.tags(sTag) (ugh @ this forum's spacing)
Code
function cleanEmptyTag(oElem) {
if (oElem.hasChildNodes) {
var tmp = oElem
for (var k = tmp.children.length; k >= 0; k--) {
if (tmp.children[k] != null) {
cleanEmptyTag(tmp.children[k]);
}
}
}
var oAttribs = oElem.attributes;
if (oAttribs != null) {
for (var j = oAttribs.length - 1; j >=0; j--) {
var oAttrib = oAttribs[j];
if (oAttrib.nodeValue != null) {
oAttribs.removeNamedItem('class')
}
}
}
oElem.style.cssText = '';
if (oElem.innerHTML == '' || oElem.innerHTML == ' ') {
oElem.outerHTML = '';
}
}
here is the code I've placed into the CUSTOM1 area
Re: [Dave] Feature request - MS Word Cleanup
[In reply to]
Can't Post
no probs, it still needs some work, sometimes it just fails in different places with unknown runtime error, but pasting into a fresh box then running the format usually works fine.