Skip to main content

Convert Word DOCX to Markdown in Java: Basic Conversion and Export Options

 Markdown is widely used in developer documentation, README files, static sites, internal knowledge bases, and Git-based content workflows. However, source documents are not always written in Markdown from the beginning. In many teams, product specs, technical drafts, reports, or user manuals may still arrive as Word documents.

This guide shows how to convert a Word document to Markdown in Java. We will start with the simplest DOCX-to-MD conversion, then add export options for images, lists, hyperlinks, underline formatting, tables, and Office Math equations.

The goal is not just to generate a .md file, but to understand which export settings matter when the Word document contains real-world formatting.

Convert Word DOCX to Markdown in Java

Prerequisites

Before starting, make sure you have:

  • JDK 8 or later
  • A Maven-based Java project
  • A DOCX file to test with
  • Basic familiarity with Java file paths

For the conversion API, this example uses Spire.Doc for Java. Add the Maven repository and dependency to your pom.xml:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.doc</artifactId>
        <version>14.7.0</version>
    </dependency>
</dependencies>

The version above is used as an example. In a real project, check the latest available version before publishing or deploying your application.

Part 1: Basic Word to Markdown Conversion

If your Word document is simple, the conversion only needs three steps:

  1. Create a Document object
  2. Load the Word file
  3. Save it as Markdown

Example Code

import com.spire.doc.*;

public class WordToMarkdownBasic {
    public static void main(String[] args) {
        Document document = new Document();

        try {
            document.loadFromFile("Data/toMarkdown.docx");
            document.saveToFile("toMarkdown_out.md", FileFormat.Markdown);
        } finally {
            document.close();
            document.dispose();
        }
    }
}

How It Works

The Document object represents the Word file in memory.

Document document = new Document();

Then the DOCX file is loaded from disk:

document.loadFromFile("Data/toMarkdown.docx");

Finally, the document is saved in Markdown format:

document.saveToFile("toMarkdown_out.md", FileFormat.Markdown);

This is enough for many plain documents, such as simple notes, short articles, or internal drafts.

However, Word files often contain more than plain text. Images, numbered lists, hyperlinks, tables, underlined text, and equations may need additional handling. That is where Markdown export options become useful.

Part 2: Convert Word to Markdown with Export Options

The following example shows how to configure Markdown output more precisely.

It includes options for:

  • Saving images to a local folder
  • Using Base64 images if needed
  • Controlling list output
  • Preserving underline formatting
  • Exporting hyperlinks as reference-style links
  • Exporting Office Math as MathML
  • Saving tables as HTML inside Markdown

Full Example Code

import com.spire.doc.*;

public class WordToMarkdownWithOptions {
    public static void main(String[] args) {
        Document document = new Document();

        try {
            // Load the source Word document.
            document.loadFromFile("Data/toMarkdown.docx");

            // Option 1: Embed images as Base64 strings in the Markdown file.
            // This keeps everything in one file, but the Markdown file may become large.
            // document.getMarkdownExportOptions().setImagesAsBase64(true);

            // Option 2: Save extracted images to a local folder.
            // The generated Markdown file will reference images from this folder.
            document.getMarkdownExportOptions().setImagesFolder("D:\\Markdown\\Images");

            // Set an alias path for the image folder in the generated Markdown.
            // This is useful if the image folder path in your local machine
            // is different from the path used after publishing.
            // document.getMarkdownExportOptions().setImagesFolderAlias("Images");

            // Output bullet and numbered lists as plain text.
            document.getMarkdownExportOptions().setListOutputMode(MarkdownListOutputMode.Plain_Text);

            // Preserve underline formatting where possible.
            document.getMarkdownExportOptions().setSaveUnderlineFormatting(true);

            // Export hyperlinks as reference-style Markdown links.
            document.getMarkdownExportOptions().setLinkOutputMode(MarkdownLinkOutputMode.Reference);

            // Export Office Math equations as MathML.
            document.getMarkdownExportOptions().setOfficeMathOutputMode(MarkdownOfficeMathOutputMode.Math_ML);

            // Save tables as HTML inside the Markdown file.
            // This is often safer for complex Word tables.
            document.getMarkdownExportOptions().setSaveAsHtml(MarkdownSaveAsHtml.Tables);

            // Save the document as Markdown.
            document.saveToFile("toMarkdown_out.md", FileFormat.Markdown);
        } finally {
            document.close();
            document.dispose();
        }
    }
}

Understanding the Export Options

1. Export Images as Separate Files

Word documents often contain screenshots, diagrams, logos, or inline images. When converting to Markdown, you need to decide how those images should be stored.

This option saves extracted images to a local folder:

document.getMarkdownExportOptions().setImagesFolder("D:\\Markdown\\Images");

This is usually the better choice for documentation projects because the Markdown file stays readable and lightweight.

A typical output may look like this:

![image](Images/image1.png)

This approach is useful when:

  • You publish Markdown to a documentation site
  • You store Markdown in Git
  • You want to manage images separately
  • You need to replace or optimize images later

2. Embed Images as Base64

You can also embed images directly into the Markdown file:

document.getMarkdownExportOptions().setImagesAsBase64(true);

This keeps the Markdown file self-contained, but it also makes the file much larger.

Base64 images may be useful for:

  • Small one-off documents
  • Temporary file transfer
  • Cases where external image files are inconvenient

For long-term documentation, separate image files are usually easier to maintain.

3. Use an Image Folder Alias

The image folder on your local machine may not be the same as the path used after publishing.

For example, locally you may save images here:

document.getMarkdownExportOptions().setImagesFolder("D:\\Markdown\\Images");

But in the final Markdown file, you may want image references to use a cleaner relative path:

document.getMarkdownExportOptions().setImagesFolderAlias("Images");

This is helpful when the Markdown file will be uploaded to a blog, Git repository, or static site generator.

4. Control List Output

Word lists and Markdown lists do not always map perfectly. Numbered lists, nested lists, and custom bullet styles can sometimes produce unexpected formatting.

This option exports lists as plain text:

document.getMarkdownExportOptions().setListOutputMode(MarkdownListOutputMode.Plain_Text);

This can be useful when the original order and wording matter more than Markdown list syntax.

For example, it may be suitable for:

  • Policy documents
  • Legal-style clauses
  • Exam papers
  • Structured internal documents

If you want the output to behave like normal Markdown, test the result before deciding whether Plain_Text is the best choice.

5. Preserve Underline Formatting

Markdown does not have a standard underline syntax. Some Markdown processors support underline through HTML, while others do not.

If the source Word document uses underline formatting for important content, enable:

document.getMarkdownExportOptions().setSaveUnderlineFormatting(true);

This is especially useful for:

  • Fill-in-the-blank exercises
  • Forms
  • Emphasized terms
  • Documents where underline has semantic meaning

If underline is only decorative, you may not need this option.

6. Export Links as Reference-Style Markdown

Markdown supports different hyperlink styles. This example uses reference-style links:

document.getMarkdownExportOptions().setLinkOutputMode(MarkdownLinkOutputMode.Reference);

A reference-style link looks like this:

Read the [documentation][1].

[1]: https://example.com

This keeps the main text cleaner, especially when the document contains many links.

Reference-style links are a good fit for:

  • Technical documentation
  • Research notes
  • Long-form tutorials
  • API guides

For short documents, inline links may be easier to read.

7. Export Office Math as MathML

If your Word document contains equations created with Office Math, you can export them as MathML:

document.getMarkdownExportOptions().setOfficeMathOutputMode(MarkdownOfficeMathOutputMode.Math_ML);

This can preserve mathematical structure better than plain text.

However, MathML support depends on where the Markdown will be rendered. Some platforms display MathML well, while others may show raw markup or require additional rendering support.

Before using this in production, test a few documents that contain real equations.

8. Save Tables as HTML

Markdown tables are simple and readable, but they have limitations. Word tables can be much more complex. They may contain:

  • Merged cells
  • Nested content
  • Multiple paragraphs in one cell
  • Special borders
  • Complex alignment

For these cases, saving tables as HTML inside the Markdown file can produce a more stable result:

document.getMarkdownExportOptions().setSaveAsHtml(MarkdownSaveAsHtml.Tables);

This makes the Markdown file less minimal, but it can preserve complex tables more accurately.

Use this option when table structure is more important than Markdown purity.

Common Issues and Fixes

Images Do Not Appear in the Markdown Preview

Check the following:

  • Were the images exported to the expected folder?
  • Does the Markdown file reference the correct relative path?
  • Is the Markdown file being previewed from the correct directory?
  • Did you move the .md file without moving the image folder?

A common fix is to keep the Markdown file and image folder together:

output
├── article.md
└── Images
    ├── image1.png
    └── image2.png

The Markdown File Is Too Large

If you enabled Base64 images, the output file can become large quickly.

Instead of:

document.getMarkdownExportOptions().setImagesAsBase64(true);

Use:

document.getMarkdownExportOptions().setImagesFolder("output/Images");

This keeps the Markdown file easier to edit and review.

Tables Look Broken

If the source document contains complex Word tables, regular Markdown tables may not be enough.

Use HTML table output:

document.getMarkdownExportOptions().setSaveAsHtml(MarkdownSaveAsHtml.Tables);

This is often a better trade-off for documents where table structure must be preserved.

Equations Are Not Rendering Correctly

MathML output depends on the target renderer.

If equations do not display correctly, consider:

  • Testing another Markdown preview tool
  • Checking whether the publishing platform supports MathML
  • Converting equations to images for platforms with limited formula support
  • Using a platform-specific math format if available

Lists Do Not Match the Original Word Document

Word lists can include custom numbering, indentation, and nested styles. If Markdown list output looks inconsistent, try plain-text list output:

document.getMarkdownExportOptions().setListOutputMode(MarkdownListOutputMode.Plain_Text);

This may preserve the reading order more reliably, although the result may not behave like a normal Markdown list.

Final Thoughts

The basic Word-to-Markdown conversion is straightforward, but real documents usually require a few export decisions.

If your Word documents are simple, start with the basic converter. If they contain images, tables, links, or equations, enable only the options you actually need and test the output in your target Markdown platform before using it in production.

Comments

Popular posts from this blog

3 Ways to Generate Word Documents from Templates in Java

A template is a document with pre-applied formatting like styles, tabs, line spacing and so on. You can quickly generate a batch of documents with the same structure based on the template. In this article, I am going to show you the different ways to generate Word documents from templates programmatically in Java using Free Spire.Doc for Java library. Prerequisite First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file. <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>...

Insert and Extract OLE objects in Word in Java

You can use OLE (Object Linking and Embedding) to include content from other programs, such as another Word document, an Excel or PowerPoint document to an existing Word document. This article demonstrates how to insert and extract embedded OLE objects in a Word document in Java by using Free Spire.Doc for Java API.   Add dependencies First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file.     <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>    ...

Simple Java Code to Convert Excel to PDF in Java

This article demonstrates a simple solution to convert an Excel file to PDF in Java by using free Excel API – Free Spire.XLS for Java . The following examples illustrate two possibilities to convert Excel to PDF:      Convert the whole Excel file to PDF     Convert a particular Excel Worksheet to PDF Before start with coding, you need to Download Free Spire.XLS for Java package , unzip it and import Spire.Xls.jar file from the lib folder in your project as a denpendency. 1. Convert the whole Excel file to PDF Spire.XLS for Java provides saveToFile method in Workbook class that enables us to easily save a whole Excel file to PDF. import com.spire.xls.FileFormat; import com.spire.xls.Workbook; public class ExcelToPDF {     public static void main(String[] args){         //Create a Workbook         Workbook workbook = new Workbook();   ...