Examples of find and replace regular expressions for HTML

How Can We Help?
< Back
You are here:
Print

Examples of find and replace regular expressions for HTML

The Doc Converter Pro template system supports regular expressions (regex) which are a very powerful (yet can be complex) feature in HTML processing. It is important to note that Doc Converter Pro supports .net regular expressions.

Generally, .net expressions are nearly the same as standard ones but we recommend you look at some of these third-party sites for guidance on how to format commands. For instance, we recommend this website if you want to learn regular expressions: https://www.regular-expressions.info

Here are some examples of find and replace usage in terms of HTML processing in Doc Converter Pro.

Please note that you need some basic knowledge of regular expressions (regex) in order to understand these examples. If you want to learn and test regular expressions online please go to https://regex101.com.

Note: by default, Doc Converter Pro is running regex commands in a single line and case-insensitive mode.

Find every <span>…</span> tag with content and remove only these span tags, but leave the inner text. (.+?) is a regex that will match (find) any text or even HTML that could be inside of that span tag. (…) in regex means a matching group or chars that can be used later in the replace text. $1 in the replace text means to replace content found by the first regex group matched by (.+?)
If you want to even match empty content (no chars at all) then you could use (.*?) – * instead of + char.

Find: <span>(.+?)</span>
Replace: $1


Find every <p…>…</p> tag in HTML with any attributes (class, style, id, etc) and remove these attributes. [^<>]+ regex means to match any chars that are not <> chars. This basic regex is used very often in HTML processing.

Find: <p[^<>]+>(.+?)</p>
Replace: <p>$1</p>


Find every <p…> tag in HTML with any or no attributes (class, style, id, etc) and class=”className” attribute to it at the end (last attribute).

Find: (<p[^<>]*)>
Replace: $1 class=”className”>


Find all style attributes in HTML and then remove them. [^<>”]* regex will match any text that does not contain <> or ” chars.

Find: style=”[^<>”]*”
Replace: leave this empty, which means deleting matched text


A more advanced example that can be tested only in Doc Converter Pro would be to find every <p…> tag in HTML and then remove the style attribute only from these <p> tags.

Find: style=”[^<>”]*”
Replace: leave this empty, which means deleting matched text
First find regex: <p[^<>]+>


Find all <td>tags in HTML that have <p> tags in them and then remove these <p> tags.

Find: (<td[^<>]*>\s*)<p[^<>]*>\s*(.+?)</p>\s*</td>
Replace: $1$2</td>


Delete all <span> tags in HTML with their inner content.

Find: <span.*?</span>
Replace: leave this empty, which means deleting matched text