This is SerkanYersen's TypePad Profile.
Join TypePad and start following SerkanYersen's activity
Join Now!
Already a member? Sign In
SerkanYersen
Recent Activity
Jeff, I have ported your code to JavaScript. Thank you, it helped so much. function cleanWord(str){ // get rid of unnecessary tag spans (comments and title) str = str.replace(/\<\!--(\w|\W)+?--\>/gim, ''); str = str.replace(/\<title\>(\w|\W)+?\<\/title\>/gim, ''); // Get rid of classes and styles str = str.replace(/\s?class=\w+/gim, ''); str = str.replace(/\s+style=\'[^\']+\'/gim, ''); // Get rid of unnecessary tags str = str.replace(/<(meta|link|\/?o:|\/?style|\/?div|\/?st\d|\/?head|\/?html|body|\/?body|\/?span|!\[)[^>]*?>/gim, ''); // Get rid of empty paragraph tags str = str.replace(/(<[^>]+>)+&nbsp;(<\/\w+>)/gim, ''); // remove bizarre v: element attached to <img> tag str = str.replace(/\s+v:\w+=""[^""]+""/gim, ''); // remove extra lines str = str.replace(/"(\n\r){2,}/gim, ''); // Fix entites str = str.replace("&ldquo;", "\""); str = str.replace("&rdquo;", "\""); str = str.replace("&mdash;", "–"); return str; }
Toggle Commented Apr 22, 2010 on Cleaning Word's Nasty HTML at Coding Horror
SerkanYersen is now following The Typepad Team
Apr 22, 2010