Conversion templates guide

How Can We Help?
< Back
You are here:
Print

Conversion templates guide

The core of the Doc Converter Pro web app is the powerful template system. You can select, and edit the built-in templates, duplicate them, and create your own custom templates. Click the Templates menu item to see all templates:

Create a new template

To create a new template click the light blue Create New Template button and start by naming your template. Optionally enter some notes describing what your template would be doing.

The next step is to select the Output Format Type from the radio group list. You can select HTML5, XHTML, HTML4 (old format – not recommended), HTML_FIXED (fixed layout – HTML looks exactly in a browser like in your document), TXT, DOCX, DOTX, PDF, ODT, RTF, or save each document page as an image.

Output File Naming – One of the cool features of Doc Converter Pro is the ability to automatically rename output files to make them web friendly. By default, Doc Converter Pro output files are named in a web-friendly way – lowercase, no special chars (only ASCII), and spaces are replaced with a dash char, but you can set your own char if needed. Also, you can change the output file extension based on the output format. For instance, the HTML output format type will set the file extension to ‘.html’.

Create Index HTML File – You can create an index HTML file with links to the output files created using the split page option. This option is in the Template Overview tab – Converting format section.

Process HTML or TXT input file by conversion engine – If you tick this option your input HTML or text files will be processed by our Converter. This means you can use all the customization options in the templates.  If you do not tick this option then options like Image Output Folder, CSS, Image & Metadata options will not work. If you untick this option then we will basically only run find and replace/delete commands on your files, so it’s a safer option if you find you have problems with code being changed.

Conversion Options

Body content only – useful for importing into a CMS – If you tick this option, Doc Converter Pro will remove the <head>…</head> section of the HTML, and just leave the body tag inner content. This is useful if you want to paste HTML into a content management system (CMS) or a template.

Plain Text Conversion – Remove all formatting – This will remove all the formatting, e.g. styles, font size, font type, etc. This is a good option if you want clean HTML or intend to use your own CSS styles. It will not remove bold and italic formatting tags.

Convert bold and italic tags to strong and em – With this option ticked Doc Converter Pro will convert <b> tags to <strong> and <i> to <em>.

Delete empty HTML paragraphs – If you tick this option, Doc Converter Pro will remove any paragraphs <p…>….</p> containing any white space chars (spaces, tabs, new lines, etc) or &nbsp; or &#xa0;  entities.

Convert web addresses and emails to links – Ticking this option will convert any web or email text addresses to clickable hyperlinks.

Save text boxes with text as HTML – Doc Converter Pro will convert Word document text box fields with text content as HTML. Please note that any border of the text box will be removed, but it can be added back via a custom CSS style rule in the template if needed. When turned off text boxes will be saved as image files. Please note that it’s an experimental feature of Doc Converter Pro and it may not produce good results for advanced text boxes.

Relative font size – Once enabled font sizes will be output in relative (em) units when saving to HTML. In many existing documents (HTML, EPUB) font sizes are specified in relative units.

Show page numbers in a table of contents – When this option is enabled Doc Converter Pro will show page numbers in a table of contents. By default, this option is turned off, because page numbers are not usable since converted document HTML is just one page and page numbers are not needed.

Convert document legacy form fields (controls) to text – when enabled, legacy form fields (text inputs, drop-downs) will be saved as normal text otherwise HTML will contain <input> elements and drop-down form fields as <select> elements in HTML.

Convert document field codes to plain text – Converts any document field codes into static text otherwise <input> HTML elements.

Delete empty new lines from output – Doc Converter Pro will delete empty new lines (more than one new line in a row) found in the HTML or TXT files.

HTML list mode – Specifies how list labels are exported to HTML:

  • Auto – Outputs list labels in auto mode. Uses HTML elements when possible.
  • Html tags – Outputs all list labels only as HTML elements.
  • As text – Outputs all list labels as inline text.

Math equations mode – Specifies how math equations will be exported to HTML: as images, MathML, or text.

Indent or compress HTML – Compress (minify) HTML output (smaller but less readable HTML) or indent it (prettify, a bit bigger HTML output, but it’s easier to read HTML in source editor). By default, Doc Converter Pro does not compress or indent HTML output. Also, you can specify your own indentation chars, by default, it’s one tab char.

Common conversion options for HTML and TXT formats:

Remove comments – When enabled all document comments will be removed from HTML or TXT output file.

Show tracking changes – When enabled Doc Converter Pro will show any document tracking changes in converted HTML or TXT.

Ignore case during conversion – If you tick this option, Doc Converter Pro will ignore the case when looking for code. For example, if ignore case is turned OFF if you ask Doc Converter Pro to find the code <p class=”HEADING_1″> it would only find <p class=”HEADING_1″> not <p class=”heading_1″> note how in the second example HEADING_1 is in lowercase.  Turning ignore case on means that Doc Converter Pro will ignore that case of the code, and in our example it would find <p class=”HEADING_1″> and the lowercase <p class=”heading_1″>

Headers and footers mode – Specifies how Doc Converter Pro generates headers and footers in output HTML. By default, the first section header and last section footer is generated in HTML output. You can select None and no headers or footers will be added to the output HTML. If you choose the Per section option Doc Converter Pro will add headers and footers for every input document section.

TXT conversion option:

Preserve table layout when saving as plain text – Enabling this option helps to preserve the table layout when saving to text format. For example, data in columns will stay on the same row.

Find and Replace/Delete the text

This section allows you to find and replace or delete any text or regular expression. A simple example would be to find some text and replace it with some other text. A more advanced example would be if you want to find all <span…>…</span> tags in an HTML file and remove these tags but keep the inner content you have to enter a regular expression like in the find field <span[^<>]*>(.+)</span> and then enter $1 in the replace field. $1 means that any text that was found in (.+) will be entered in the replacement value. Regular expressions are a very powerful tool when doing find and replace in any text files.

Doc Converter Pro supports .NET Regular Expressions (RegEx), see this page for more information.

Image conversion options

Images folder – optionally enter your own images folder name where all images from the input document will be saved. If you do not enter anything here images will be saved in a folder name based on the input document name.

Image files prefix – optional set your own custom image file name prefix, for example: image, picture, etc. Otherwise, output image file names are based on the input document file name.

Image type  – if there are images in your document you can select what image format you want to convert them to. We recommend using JPG or PNG, but you can also select GIF, BMP, or WMF image formats. There is also an option to control the JPG compression level (Jpeg Image quality), the higher it is the better quality the image will be, but the file size rises in accordance. Generally, for photos, you should use JPG, and for clipart or graphics use PNG.

Embed images in HTML – You can embed images directly into your HTML file so you do not need separate image files – this is a great way to make files self-contained. Please note that if your document has many or big images we do not recommend using this option because your HTML size will be very big and not useful on websites. Embedding images is OK if your document has got a few smaller or medium size images or if you do not plan to put such HTML online.

Keep <img> tag width and height attributes – this option will make Doc Converter Pro keep the width and height <img> tag attributes in the HTML. Un-ticking it will remove those attributes from the converted HTML.

Highest image quality – generally it should increase the quality of images. By default images DPI is set to 96dpi, but you can change this in the Images DPI drop-down selector. 96dpi is the recommended image quality for web images, any higher and your images will be large and slow to load. The high-quality option will slow down conversion slightly, so if you are converting lots of files and image quality is not important to you, you can experiment with turning this option off.

Scale images to shape size – by default Doc Converter Pro will scale images to their shape size from the source document. Turn this option off if you want to save images in their original size from the document. Keep in mind that if the ‘Keep <img> tag width and height attributes’ option is enabled then the browser will resize images to their shape size.

Do not save any image files at all – if you tick this option we will remove all images from the document before we convert it to HTML. This will speed up document conversions.

CSS Options

No CSS – If you tick this option, no CSS will be used in the HTML file – which means that nearly all text formatting will be removed. Only basic formatting like bold, underline, and emphasize will be left in HTML.

Inline CSS – All CSS is put into HTML tags style attribute, not in <style>tag.

Normal CSS – The common CSS is put into the head <style> tag section of the HTML file and sometimes in the tag style attribute.

Save CSS in a separate file – This option is similar to Normal CSS, but all the common CSS will be put into a separate CSS file linked to the HTML file.

Save CSS in a separate custom file name – This will put all the common CSS into a separate custom CSS file name that you will specify.

Delete CSS rules not being used in HTML – option is useful in Normal and both Save CSS options to remove any CSS rules that are not used in the output HTML.

CSS own files – You can add your own CSS files to the converted HTML by entering each file name per line.

CSS own styles – You can add your own CSS styles to the converted HTML.

CSS styles to keep or remove – You can specify which particular CSS styles to keep or remove from the HTML file. You can even enter simple regular expressions here. For instance, to remove any margin-top/left/right/bottom you could enter margin-[a-z]+ regular expression.

CSS Find and Replace

CSS find and replace options allow you to find and replace, rename or delete any CSS rule names in the output HTML document. In order to know CSS class names or tag ids you would have to view and analyze the HTML output source code first.

Split Document per Tag or Page

This section allows you to select options in order to split your converted document into separate HTML files per page or per HTML tag like h1.

Split per page – This will convert each page of your document to its own HTML file. For example, if your document has 4 pages, Doc Converter Pro will create 4 separate .html pages, one for each page.

Split per tag – This will split the HTML output file into smaller files by a tag, for example heading tag: h1. If you enter h1 Doc Converter Pro will look for each <h1> tag and it will place all the code from that tag onwards, into its own file.
By default, Doc Converter Pro will name split files with the file name and the tag inner text. For example, if your file is called ‘catalogue’ then files will be called catalogue_introduction.html, catalogue_products.html, etc.
If you tick the numbered output file names then the names would be catalogue_1.html, catalogue_2.html, etc.

Create index HTML file – You can create an index HTML file with links to the files created using the split option.

Encoding options

By default, all Doc Converter Pro template files are encoded as UTF-8. Most English Web Pages are encoded to UTF-8. If you are cleaning existing HTML files that are encoded differently e.g. ANSI then you will need to change the template encoding setting to match the encoding of the file you are cleaning.

Load and Save file encoding – These options allow you to set the input and output file character encoding. If you don’t need to change encoding just leave these options set to the AUTO value.

Save files with UTF8 BOM marker – If you tick this option, Doc Converter Pro will save all files with UTF8 special marker (special chars at the beginning of text file) that is used to detect UTF8 file encoding.

New line type – You can select a new line type: default Windows (\r\n), Linux or Mac OSX (\n) or pre-OSX Mac (\r).

Convert special chars to HTML Entities – these options will convert output file non-ASCII or special chars to their equivalent HTML entities. If we take as an example the copyright symbol © this as a numbered entity would be: &#169; and as a named entity it would be: &copy;. If you do not know what option to select then just select do not convert.

Delete Tags or Attributes

Tags to delete – This will delete all specified tags in the HTML output file, but it will leave the content that was in these tags. To delete the content in a given tag tick the remove tag with the content option.

Delete all tag attributes – If you added the tag like p it would remove all attributes for every <p> tag in the HTML file. For example: <p class=”className”> would become <p>.

Delete all attributes – This option deletes listed tag attributes from the HTML file. For example, if you added the style attribute then it would delete all of the style attributes across all tags in the HTML file.

Delete empty tags – This option allows you to remove any empty tags. For example, if you enter span it will delete all empty <span> tags. Ticking the ‘Delete all empty tags’ option will automatically delete any empty tags in the HTML file without you having to specify the tags.

Metadata and Page Title

Metadata options – allow you to add/edit the metadata information contained in the <head> section of the HTML file. For example, if you enter the author text ‘Mark Smith’ then this will be inserted into the HTML as <meta name=”author” content=”Mark Smith”>.
If you enter #AUTHOR# string in the Meta Author field then author metadata from the input document will be inserted in the <meta name=”author” content=”…”> content attribute. Please note that if your document does not have any metadata then we cannot add it automatically.

Page title – If you use it Doc Converter Pro will change the <title>…</title> tag in the HTML file. For example, if you enter ‘About us’ it will change the title to <title>About us</title>. If you enter the #TITLE# variable Doc Converter Pro will get title metadata from the input document. If you enter #FILENAME# then Doc Converter Pro will use the file name for the title text. For example, if your file is called ‘my file.doc’ then the title tag will be <title>my file</title>.

Please note that metadata variables can be used for the metadata section and on Find and Replace/Delete.
Variable names that can be used as parameters:

#TITLE#, #AUTHOR#, #SUBJECT#, #KEYWORDS#, #CATEGORY#, #COMMENTS#, #COMPANY#, #CREATIONDATE#, #LASTSAVETIME#, #MANAGER#, #PAGES#, #REVISIONNUMBER#, #FILENAME#, #PAGESPLITTAGTEXT#

Replace HTML header or footer

Specify your own HTML header (from HTML file start to <body> tag) and footer (from </body> tag till the end of file).

Add HTML before or after tags

Add any HTML before or after start or end tags in the converted files. For instance, you can add custom metadata, style links, or script links in <head> tag or add some JavaScript code before the end of </body> tag.

CSV export options

Doc Converter Pro can export converted files to the CSV format that you can then import to a database or Excel. For instance, you can create a CSV file in WordPress posts table format, and then you can import a such CSV file into WordPress with some import CSV plugin.