interactivetools.com

  Main
Index
Search
Posts
Who's
Online
Log
In

Home: Open Source: htmlArea v1.x:
Feature request - MS Word Cleanup

 

 


Goudinov
User

Nov 6, 2002, 8:51 AM

Post #1 of 10 (233 views)
Copy Shortcut
Feature request - MS Word Cleanup Can't Post

I know this may be a tall order. I was unable to find a script online that did this. The script that comes with Dreamweaver is supposed to be able to be used outside of DW, but I couldn't impliment it.

Basically we have content managers that edit their content in MS Word versions 97 - XP. The content can be copied and pasted into the htmlArea, but when you view HTML you get... well, you get what the Office developers try to pass off as HTML.

So, basically I'm looking for a command I can run against the text input that strips out all that MS Word Junk. Can this be added to htmlArea? And/Or anyone know of a cleanWordHTML(text); function?

Smile thanks


JavaRanger
New User

Nov 6, 2002, 9:59 AM

Post #2 of 10 (228 views)
Copy Shortcut
Re: [Goudinov] Feature request - MS Word Cleanup [In reply to] Can't Post

Are you utilizing xml? If so I have a parser that might be able to help you. It converts entities and strips MSWORD problem characters.


Goudinov
User

Nov 6, 2002, 11:04 AM

Post #3 of 10 (222 views)
Copy Shortcut
Re: [JavaRanger] Feature request - MS Word Cleanup [In reply to] Can't Post

I'm not really using XML in the application, but I do have access to powerful XML and RegEx objects on the server.


skarbratt
New User

Nov 6, 2002, 11:28 AM

Post #4 of 10 (215 views)
Copy Shortcut
Re: [Goudinov] Feature request - MS Word Cleanup [In reply to] Can't Post

Try to use: RemoveFormat



Mor info on Microsoft's homepage: http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/commandids.asp



Johan Skårbratt


Goudinov
User

Nov 6, 2002, 12:25 PM

Post #5 of 10 (211 views)
Copy Shortcut
Re: [skarbratt] Feature request - MS Word Cleanup [In reply to] Can't Post

RemoveFormat has the opposite effect of what I want. RemoveFormat removes all formatting except MS Word proprietary formatting.

I was hoping to use RemoveParaFormat to get rid of some of it but it does nothing and is not yet implimented.


Goudinov
User

Nov 6, 2002, 2:46 PM

Post #6 of 10 (203 views)
Copy Shortcut
Re: [Goudinov] Feature request - MS Word Cleanup [In reply to] Can't Post

the form doesn't keep my indenting, but here's what I've got so far..

Basically you need a routine that...
  1. Removes all SPAN tags while maintaining the innerHTML
  2. Similarly removes all FONT tags
  3. Gets a handle on each P tag and....
    1. Recursively looks for children tags and...
      1. Removes the tag if it's empty or...
      2. Removes the style and class attributes from the tag
    2. Removes the P tag if it's empty or...
    3. Removes the style and class attributes from the P tag



Code
  

//


// CUSTOM BUTTONS START HERE

//

// Custom1

else if (cmdID == 'custom1') {

//alert("Hello, I am custom button 1!");



var oTags = editdoc.all.tags("SPAN");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

//alert(i);

oTags[i].outerHTML = oTags[i].innerHTML;

}

}

oTags = editdoc.all.tags("FONT");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

//alert(i);

oTags[i].outerHTML = oTags[i].innerHTML;

}

}



oTags = editdoc.all.tags("P");

if (oTags != null) {

//alert(oTags.length);

for (var i = oTags.length - 1; i >= 0; i--) {

//alert(i);

var oAttribs = oTags[i].attributes;

if (oAttribs != null) {

for (var j = oAttribs.length - 1; j >=0; j--) {

var oAttrib = oAttribs[j];

if (oAttrib.nodeValue != null) {

if (oAttrib.nodeValue.length > 0) {

if (oAttrib.specified) {

oAttribs.removeNamedItem(oAttrib.nodeName)

}

}

}

//alert(oAttrib.nodeName);

}

}

oTags[i].style.cssText = '';

if (oTags[i].innerHTML == " " || oTags[i].innerHTML.length == 0) {oTags[i].outerHTML = '';}

//if (oTags[i].innerHTML.length == 0) {oTags[i].outerHTML = '';}



/*if (oTags[i]) {

var kids = oTags[i].children;

if (kids != null) {

for (var k = oAttribs.length - 1; k >=0; k--) {

var kid = kids[k];

//alert(kid);

if (kid != null) {

cleanEmptyTag(kid);

//alert(kid.tagName);

}

}

}

}*/


}

}

}


function
cleanEmptyTag(oElem) { //this causes a happy Stack Overflow

if (oElem.children) {

var tmp = oElem

cleanEmptyTag(tmp)

}

if (oElem.innerHTML == '' || oElem.innerHTML == ' ') {

oElem.outerHTML = '';

}

}



Goudinov
User

Nov 7, 2002, 6:54 AM

Post #7 of 10 (179 views)
Copy Shortcut
Re: [Goudinov] Feature request - MS Word Cleanup [In reply to] Can't Post

OK here is what I have. It seems to work fairly well. The only problem is that when you paste the MS Word content there is no overall wrapper tag like BODY or something. So for this to work you have to know all the parent tags. For now I'm just checking for P, OL, & UL. I am also removing all SPAN tags. I'm doing this because there doesn't seem to be an overall TAGS collection, you have to provide the sTag parameter for document.all.tags(sTag) (ugh @ this forum's spacing)


Code
  

function
cleanEmptyTag(oElem) {

if (oElem.hasChildNodes) {

var tmp = oElem

for (var k = tmp.children.length; k >= 0; k--) {

if (tmp.children[k] != null) {

cleanEmptyTag(tmp.children[k]);

}

}

}

var oAttribs = oElem.attributes;

if (oAttribs != null) {

for (var j = oAttribs.length - 1; j >=0; j--) {

var oAttrib = oAttribs[j];

if (oAttrib.nodeValue != null) {

oAttribs.removeNamedItem('class')

}

}

}

oElem.style.cssText = '';

if (oElem.innerHTML == '' || oElem.innerHTML == ' ') {

oElem.outerHTML = '';

}

}

here is the code I've placed into the CUSTOM1 area


Code
  

// Custom1


else if (cmdID == 'custom1') {

//alert("Hello, I am custom button 1!");



var oTags = editdoc.all.tags("SPAN");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

oTags[i].outerHTML = oTags[i].innerHTML;

}

}



/*oTags = editdoc.all.tags("FONT");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

//alert(i);

oTags[i].outerHTML = oTags[i].innerHTML;

}

}*/




oTags = editdoc.all.tags("P");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

cleanEmptyTag(oTags[i]);

}

}



oTags = editdoc.all.tags("OL");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

cleanEmptyTag(oTags[i]);

}

}



oTags = editdoc.all.tags("UL");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

cleanEmptyTag(oTags[i]);

}

}

}



Goudinov
User

Nov 7, 2002, 7:55 AM

Post #8 of 10 (173 views)
Copy Shortcut
Re: [Goudinov] Feature request - MS Word Cleanup [In reply to] Can't Post

cleanEmptyTag didn't like to work on tables so I made this...


Code
  

function
cleanTable(oElem) {

oElem.style.cssText = '';

var oAttribs = oElem.attributes;

if (oAttribs != null) {

for (var j = oAttribs.length - 1; j >=0; j--) {

var oAttrib = oAttribs[j];

if (oAttrib.nodeValue != null) {

oAttribs.removeNamedItem('class')

}

}

}



var oTR = oElem.rows;

if (oTR != null) {

for (var r = oTR.length - 1; r >= 0; r--) {

oTR[r].style.cssText = '';

}

}



var oTD = oElem.cells;

if (oTD != null) {

for (var t = oTD.length - 1; t >= 0; t--) {

oTD[t].style.cssText = '';

}

}

}

then call it after the other cleanups.


Code
  

oTags = editdoc.all.tags("TABLE");

if (oTags != null) {

for (var i = oTags.length - 1; i >= 0; i--) {

//oTagsTABLE[i].style.cssText = '';

cleanTable(oTags[i]);

}

}



Dave
Enthusiast / Moderator


Nov 8, 2002, 4:26 PM

Post #9 of 10 (147 views)
Copy Shortcut
Re: [Goudinov] Feature request - MS Word Cleanup [In reply to] Can't Post

Thanks for sharing your code! : )

Dave Edis - Senior Developer
interactivetools.com


Goudinov
User

Nov 8, 2002, 4:29 PM

Post #10 of 10 (144 views)
Copy Shortcut
Re: [Dave] Feature request - MS Word Cleanup [In reply to] Can't Post

no probs, it still needs some work, sometimes it just fails in different places with unknown runtime error, but pasting into a fresh box then running the format usually works fine.

 
 
 


Search for (options)
Content Management SystemContent Management Software

Home | Conditions of Use | Privacy Policy | Site Map | Contact Us
Copyright © 1999-2003 interactivetools.com, inc.