Skip to main content

How to Convert HTML to Word in C#

In modern .NET development, there’s often a need to convert web content into editable Word documents. Whether you’re archiving web articles or generating reports from HTML templates, having a dependable way to transform HTML into well-formatted Word files is crucial. In this article, we’ll explore several practical approaches to converting HTML to Word using C#, including techniques for both static HTML files and dynamically generated HTML content.

HTML to Word in C#


Getting Your Environment Ready

First of all, we need to bring in the tool for the job. While there are open-source alternatives like the Open XML SDK, they often require manually mapping every HTML tag to a Word element, which is incredibly time-consuming. We’ll use Free Spire.Doc here because it handles the heavy lifting of the "translation" for us. To get started, pull the package into your project via NuGet: PM> Install-Package FreeSpire.Doc

1. Preparation: Creating a Sample HTML File

Let’s assume we have a standard HTML file with some styling, a heading, a paragraph, and a table: Sample: input.html

<!DOCTYPE html>
<html>
<head>
    <style>
        body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; }
        .header { color: #2e74b5; text-align: center; }
        table { width: 100%; border-collapse: collapse; }
        th, td { border: 1px solid #ddd; padding: 8px; }
        th { background-color: #f2f2f2; }
    </style>
</head>
<body>
    <h1 class="header">Quarterly Sales Report</h1>
    <p>This document was generated automatically from the <b>Web Portal</b>.</p>
    <table>
        <tr>
            <th>Product</th>
            <th>Quantity</th>
            <th>Status</th>
        </tr>
        <tr>
            <td>Cloud Subscription</td>
            <td>142</td>
            <td style="color: green;">Completed</td>
        </tr>
    </table>
</body>
</html>

2. Converting an HTML File to Word in C

The process to convert an existing HTML file to Word is straightforward. Simply load the HTML file with LoadFromFile and then save it as a Word file with SaveToFile. The library automatically parses the HTML and maps the CSS styles to Word's formatting.

using Spire.Doc;

namespace HtmlToWordExample
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize a new Document object
            Document document = new Document();

            // Load the HTML file from the disk
            // We use XHTMLValidationType.None to ensure the parser doesn't 
            // crash on minor syntax errors in the HTML.
            document.LoadFromFile("input.html", FileFormat.Html, XHTMLValidationType.None);

            // Save the result as a modern .docx file
            document.SaveToFile("OutputReport.docx", FileFormat.Docx2016);

            document.Close();
        }
    }
}

3. Converting a Dynamic HTML String to Word in C

In many web applications, HTML content is stored in a database or generated dynamically as a string. You can add this string directly to a Word paragraph with the library’s AppendHTML method, like this:

public void ExportHtmlString(string htmlString)
{
    Document doc = new Document();

    // Word documents are organized into sections. We must add one first.
    Section section = doc.AddSection();

    // Append the raw HTML string into the section
    section.AddParagraph().AppendHTML(htmlString);

    // Export to Docx
    doc.SaveToFile("DynamicOutput.docx", FileFormat.Docx2016);
    doc.Close();
}

4. Advanced Handling: Page Breaks, Headers, Footers, and Page Numbers

Real-world conversion often requires more than just a direct dump of content. You might need to control how your HTML file is converted, such as managing page breaks, adding headers/footers/page numbers.

4.1 Forcing Page Breaks

If you want to ensure certain HTML elements start on a new page in Word, you can achieve this in two ways:

a. Use a CSS style “page-break-before: always” in your HTML source.

public void GenerateMultiPageReport()
{
    Document doc = new Document();
    Section section = doc.AddSection();

    // We use a CSS style 'page-break-after:always' 
    // This inserts a physical page break in Word
    string htmlContent = @"
        <html>
            <h1>Page 1</h1>
            <p>This content is on the first page.</p>
            <br style="page-break-before: always" />
            <h1>Page 2</h1>
            <p>This content starts on a fresh page!</p>
        </html>";

    section.AddParagraph().AppendHTML(htmlContent);
    doc.SaveToFile("MultiPageReport.docx", FileFormat.Docx2013);
}

b. Manually insert page breaks after adding the HTML using the library’s AppendBreak method.

doc.Sections[0].Paragraphs[1].AppendBreak(BreakType.PageBreak);

5. Key Considerations

When converting HTML to Word, remember that Word is not a web browser. Word’s rendering engine is closer to an old version of Internet Explorer than a modern browser. HTML content looks good in a browser, but may not look the same in Word. Addressing these areas can prevent rendering errors:

5.1 Fixing Image Path Issues

In HTML, images often use relative paths like <img src="logo.png">. When the conversion runs, the library might not know where that file is. The most reliable fix is to resolve the absolute path before passing the HTML to the library.

// Define your base directory (e.g., your project's assets folder)
string baseDirectory = AppDomain.CurrentDomain.BaseDirectory;
string fullImagePath = Path.Combine(baseDirectory, "assets", "logo.png");

// Replace the relative path with a full local path so the library can find it
string finalHtml = htmlTemplate.Replace("logo.png", fullImagePath);

5.2 Managing Fonts and Consistency

Word will render a font correctly only if it is installed on the machine opening the document. For maximum compatibility, stick to "Web Safe" fonts like Arial, Times New Roman, or Calibri.

If your branding requires specific non-standard fonts, you can embed them directly into the document. Note that this will increase the final file size.

// Enable font embedding for cross-platform consistency
document.EmbedFontsInFile = true;
// Manually add a private font to the document's font list
document.PrivateFontList.Add(new PrivateFontPath("CustomFont", @"C:\Fonts\CustomFont.ttf")); 

5.3 Compatibility Rules

To ensure your layout doesn't break, follow these rules:

  • Avoid Modern Layouts: Word does not support CSS Flexbox or Grid. For side-by-side content or complex alignments, use standard HTML <table> elements. They are the most stable way to manage structure in Word.

  • Better to Use Inline Styles: While external stylesheets are sometimes supported, Word may ignore complex CSS selectors. For critical formatting (like background colors or specific widths), use inline styles: <td style="background-color: #f2f2f2;">.

  • Stick to Basic Tags: Modern semantic tags like <nav>, <article>, or <section> are often ignored or stripped. Stick to the classics: <p>, <h1>–<h6>, <table>, and <img>.

  • No Interactivity: Remember that JavaScript, buttons, and video elements will be stripped out or rendered as static, non-functional placeholders.

Conclusion

Converting HTML to Word in C# is a practical solution for turning web content into editable, professional documents. This article explored how to handle both static HTML files and dynamic HTML strings, manage layout elements like page breaks, and enhance reports with headers, footers, and page numbers.

By applying these techniques, you can streamline document generation and ensure your Word outputs accurately reflect the original HTML content.

Just remember: always validate the output with your real HTML content to ensure the final Word document matches your expectations.

Comments

Popular posts from this blog

3 Ways to Generate Word Documents from Templates in Java

A template is a document with pre-applied formatting like styles, tabs, line spacing and so on. You can quickly generate a batch of documents with the same structure based on the template. In this article, I am going to show you the different ways to generate Word documents from templates programmatically in Java using Free Spire.Doc for Java library. Prerequisite First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file. <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>...

Insert and Extract OLE objects in Word in Java

You can use OLE (Object Linking and Embedding) to include content from other programs, such as another Word document, an Excel or PowerPoint document to an existing Word document. This article demonstrates how to insert and extract embedded OLE objects in a Word document in Java by using Free Spire.Doc for Java API.   Add dependencies First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file.     <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>    ...

Simple Java Code to Convert Excel to PDF in Java

This article demonstrates a simple solution to convert an Excel file to PDF in Java by using free Excel API – Free Spire.XLS for Java . The following examples illustrate two possibilities to convert Excel to PDF:      Convert the whole Excel file to PDF     Convert a particular Excel Worksheet to PDF Before start with coding, you need to Download Free Spire.XLS for Java package , unzip it and import Spire.Xls.jar file from the lib folder in your project as a denpendency. 1. Convert the whole Excel file to PDF Spire.XLS for Java provides saveToFile method in Workbook class that enables us to easily save a whole Excel file to PDF. import com.spire.xls.FileFormat; import com.spire.xls.Workbook; public class ExcelToPDF {     public static void main(String[] args){         //Create a Workbook         Workbook workbook = new Workbook();   ...