{"id":271,"date":"2009-01-29T15:02:25","date_gmt":"2009-01-29T19:02:25","guid":{"rendered":"http:\/\/www.lilithebowman.com\/blog\/2009\/01\/29\/remove-ms-word-styles-from-pasted-content\/"},"modified":"2009-01-29T15:14:56","modified_gmt":"2009-01-29T19:14:56","slug":"remove-ms-word-styles-from-pasted-content","status":"publish","type":"post","link":"https:\/\/www.lilithebowman.com\/blog\/2009\/01\/remove-ms-word-styles-from-pasted-content\/","title":{"rendered":"Remove MS Word Styles from Pasted Content"},"content":{"rendered":"<p>While creating WYSIWYG editor fields for CMS engines I&#8217;ve often had the issue of clients pasting in files from Microsoft Word which somehow applies all kinds of unwanted formatting that either just carries over the ugliness of their original document or screws up the web layout and semantic correctness completely.<\/p>\n<p>I&#8217;ve come up with this function to remove extra formatting from HTML WYSIWYG editor input such as TinyMCE.<\/p>\n<pre class=\"code\" style=\"display: block; width: 100%; height: 250px; overflow: scroll;\">\r\n\t\/**\r\n\t* Remove HTML tags, including invisible text such as style and\r\n\t* script code, and embedded objects.  Add spaces around\r\n\t* block-level tags to prevent word joining after tag removal.\r\n\t*\/\r\n\tfunction strip_html_tags( $text )\r\n\t{\r\n\t$text = preg_replace(\r\n\tarray(\r\n\t\/\/ Remove invisible content\r\n\t'@&lt;head[^&gt;]*?&gt;.*?&lt;\/head&gt;@siu',\r\n\t'@&lt;style[^&gt;]*?&gt;.*?&lt;\/style&gt;@siu',\r\n\t'@&lt;script[^&gt;]*?.*?&lt;\/script&gt;@siu',\r\n\t'@&lt;object[^&gt;]*?.*?&lt;\/object&gt;@siu',\r\n\t'@&lt;embed[^&gt;]*?.*?&lt;\/embed&gt;@siu',\r\n\t'@&lt;applet[^&gt;]*?.*?&lt;\/applet&gt;@siu',\r\n\t'@&lt;noframes[^&gt;]*?.*?&lt;\/noframes&gt;@siu',\r\n\t'@&lt;noscript[^&gt;]*?.*?&lt;\/noscript&gt;@siu',\r\n\t'@&lt;noembed[^&gt;]*?.*?&lt;\/noembed&gt;@siu',\r\n\t'\/class=(.*)Mso(.*)&quot;\/',\r\n\t'\/class=(.*)mso(.*)&quot;\/',\r\n\t'\/style=(.*)&quot;\/',\r\n\t'\/&lt;!--(.*)--&gt;\/',\r\n\t),\r\n\tarray(\r\n\t' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '', '', '', '', ''\r\n\t),\r\n\t$text );\r\n\t$text = str_replace( &quot;&amp;lt;!--&quot;, &quot;&lt;!--&quot;, $text );\r\n\t$text = str_replace( &quot;--&amp;gt;&quot;, &quot;--&gt;&quot;, $text );\r\n\t$text = str_replace( &quot;&lt;style&gt;&quot;, &quot;&quot;, $text );\r\n\t$text = str_replace( &quot;&lt;\/style&gt;&quot;, &quot;&quot;, $text );\r\n\t\r\n\treturn strip_tags( $text, '&lt;address&gt;&lt;blockquote&gt;&lt;del&gt;&lt;div&gt;&lt;h1&gt;&lt;h2&gt;&lt;h3&gt;&lt;h4&gt;&lt;h5&gt;&lt;h6&gt;&lt;ins&gt;&lt;p&gt;&lt;a&gt;&lt;b&gt;&lt;i&gt;&lt;u&gt;&lt;img&gt;&lt;pre&gt;&lt;dl&gt;&lt;dt&gt;&lt;dd&gt;&lt;li&gt;&lt;ol&gt;&lt;ul&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;&lt;td&gt;&lt;caption&gt;&lt;abbr&gt;&lt;acronym&gt;&lt;span&gt;&lt;strong&gt;&lt;em&gt;' );\r\n\t} \/\/ end strip_html_tags\r\n<\/pre>\n<p>Do you guys have any ideas?<\/p>\n<p>P.S. I had a hell of a time trying to paste this into WordPress even. I guess something might need to be done there.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>While creating WYSIWYG editor fields for CMS engines I&#8217;ve often had the issue of clients pasting in files from Microsoft Word which somehow applies all kinds of unwanted formatting that either just carries over the ugliness of their original document or screws up the web layout and semantic correctness completely. I&#8217;ve come up with this [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[23],"class_list":["post-271","post","type-post","status-publish","format-standard","hentry","category-daily-musings","tag-remove-ms-word-styles-from-pasted-content"],"_links":{"self":[{"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/posts\/271","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/comments?post=271"}],"version-history":[{"count":5,"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/posts\/271\/revisions"}],"predecessor-version":[{"id":276,"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/posts\/271\/revisions\/276"}],"wp:attachment":[{"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/media?parent=271"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/categories?post=271"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lilithebowman.com\/blog\/wp-json\/wp\/v2\/tags?post=271"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}