Skip to main content

How to Find and Highlight Text in Word Documents Using C# (String Search & Regex)

 In office automation, document auditing, and data extraction workflows, developers often need to locate specific keywords or complex patterns within large batches of Word documents and highlight them for quick review.

While the traditional Microsoft Office Interop approach can achieve this, it requires Microsoft Word to be installed on the server. This dependency often leads to stability issues, permission errors, and performance bottlenecks in headless or server-side environments.

A more robust alternative is using Free Spire.Doc for .NET, a standalone library that allows you to read, write, and manipulate Word documents without needing Microsoft Office installed. In this tutorial, we will walk through two practical examples demonstrating how to:

  1. Find and highlight exact strings (e.g., specific terms).

  2. Find and highlight complex patterns using Regular Expressions (Regex).

Prerequisites

To get started, add the Free Spire.Doc package to your .NET project via the NuGet Package Manager:

Install-Package FreeSpire.Doc

Alternatively, you can download the DLLs directly from the official website and add them as references manually.

Example 1: Finding and Highlighting Exact Strings

Let’s say you have a literary analysis document (input.docx) and need to locate every instance of the term "transcendentalism" and highlight it in yellow.

Implementation Logic

  1. Load the target Word document.

  2. Use the FindAllString method to retrieve all occurrences.

  3. Iterate through the results and apply a yellow highlight color.

  4. Save the modified document.

Complete Code Example

using System;
using System.Drawing;
using Spire.Doc;
using Spire.Doc.Documents;

namespace FindHighlightSimple
{
    class Program
    {
        static void Main(string[] args)
        {
            // 1. Initialize a new Document instance
            Document document = new Document();

            // 2. Load the source Word document
            // Ensure "input.docx" exists in your execution directory
            document.LoadFromFile("input.docx");

            Console.WriteLine("Searching for 'transcendentalism'...");

            // 3. Find all occurrences of the string
            // Parameters: 
            // "transcendentalism" -> The search term
            // false               -> Case-insensitive search
            // true                -> Match whole words only (prevents matching inside other words)
            TextSelection[] matches = document.FindAllString("transcendentalism", false, true);

            Console.WriteLine($"Found {matches.Length} matches.");

            // 4. Apply yellow highlighting to each match
            foreach (TextSelection selection in matches)
            {
                // GetAsOneRange() ensures the selection is treated as a single continuous range
                // even if it spans multiple formatting blocks.
                selection.GetAsOneRange().CharacterFormat.HighlightColor = Color.Yellow;
            }

            // 5. Save the output file
            string outputPath = "HighlightResult.docx";
            document.SaveToFile(outputPath, FileFormat.Docx);

            Console.WriteLine($"Success! Saved to: {outputPath}");
        }
    }
}

Explanation

  • FindAllString: The most efficient method for exact text matching. It returns a TextSelection array containing the coordinates of every match.

  • GetAsOneRange(): Crucial for consistent formatting. If a found word is split across different internal XML nodes, this method merges them into a single range object so styles apply uniformly.

  • CharacterFormat.HighlightColor: Leverages standard .NET System.Drawing.Color values (e.g., Color.YellowColor.Red) to apply Word’s native highlighting.

Example 2: Using Regular Expressions for Pattern Matching

Real-world scenarios often require finding dynamic patterns rather than static text. For instance, you might need to identify all template placeholders like [Name][Date], or [ID_123] to verify they have been filled.

Free Spire.Doc supports Regular Expressions (Regex) via the FindAllPattern method, making it easy to target complex structures.

Scenario

We want to find all placeholders formatted as [Word] (e.g., [Username][Address]) and highlight them in light green with bold text.

Implementation Logic

  1. Define a Regex pattern to match brackets and alphanumeric content.

  2. Execute FindAllPattern against the document.

  3. Loop through matches to apply green highlighting and bold styling.

Complete Code Example

using System;
using System.Drawing;
using System.Text.RegularExpressions;
using Spire.Doc;
using Spire.Doc.Documents;

namespace FindHighlightRegex
{
    class Program
    {
        static void Main(string[] args)
        {
            // 1. Initialize Document
            Document document = new Document();

            // 2. Load the document containing placeholders
            document.LoadFromFile("Template.docx");

            Console.WriteLine("Scanning for placeholders using Regex...");

            // 3. Define the Regex pattern
            // $$   : Escaped left bracket
            // \w+  : One or more word characters (letters, digits, underscore)
            // $$   : Escaped right bracket
            Regex pattern = new Regex(@"$$\w+$$", RegexOptions.IgnoreCase);

            // 4. Find all matches based on the pattern
            TextSelection[] selections = document.FindAllPattern(pattern);

            Console.WriteLine($"Found {selections.Length} placeholders.");

            // 5. Apply formatting to each match
            foreach (TextSelection selection in selections)
            {
                var range = selection.GetAsOneRange();

                // Log the found text (optional)
                Console.WriteLine($"  - Detected: {range.Text}");

                // Apply Light Green highlight
                range.CharacterFormat.HighlightColor = Color.LightGreen;

                // Optional: Make the text bold for extra visibility
                range.CharacterFormat.Bold = true;
            }

            // 6. Save the result
            string outputPath = "RegexHighlightResult.docx";
            document.SaveToFile(outputPath, FileFormat.Docx);

            Console.WriteLine($"\nDone! Output saved to: {outputPath}");
        }
    }
}

Understanding the Regex

  • @"$$\w+$$":

    • The @ symbol creates a verbatim string, simplifying backslash escaping.
    • $$ and $$ explicitly match the literal bracket characters.
    • \w+ captures any sequence of letters, numbers, or underscores inside the brackets.
  • RegexOptions.IgnoreCase: Ensures the search is case-insensitive, catching [NAME][name], and [Name] equally.

Extending the Pattern

You can adapt the Regex for various data extraction tasks:

  • Dates@\d{4}-\d{2}-\d{2} (Matches 2023-10-01)

  • Emails@\w+@\w+\.\w+ (Basic email matcher)

  • Variables@\$[A-Za-z]+ (Matches variables like $Variable)

Best Practices & Considerations

1. Required Namespaces

Ensure you include the following at the top of your file:

using Spire.Doc;
using Spire.Doc.Documents;
using System.Drawing;             // For Color definitions
using System.Text.RegularExpressions; // For Regex logic

2. Performance Tips

FindAllString and FindAllPattern are optimized for speed and handle documents with hundreds of pages efficiently. However, if you are performing heavy additional processing inside the foreach loop (e.g., network calls or database writes), consider batching operations or profiling memory usage first.

Conclusion

Automating text highlighting in Word documents doesn't require Microsoft Office installation. By leveraging C# and Free Spire.Doc for .NET, you can build robust, server-friendly solutions for:

  • Exact Keyword Search: Ideal for spell-checking, compliance auditing, or terminology enforcement.

  • Regex Pattern Matching: Perfect for validating templates, extracting data fields, or flagging dynamic content.

This approach offers a lightweight, stable alternative to Interop, making it perfect for ASP.NET Core backends, Azure Functions, and Windows desktop utilities.

Have questions about implementing Word automation in your project? Leave a comment below!

Further Reading

Comments

Popular posts from this blog

3 Ways to Generate Word Documents from Templates in Java

A template is a document with pre-applied formatting like styles, tabs, line spacing and so on. You can quickly generate a batch of documents with the same structure based on the template. In this article, I am going to show you the different ways to generate Word documents from templates programmatically in Java using Free Spire.Doc for Java library. Prerequisite First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file. <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>...

Insert and Extract OLE objects in Word in Java

You can use OLE (Object Linking and Embedding) to include content from other programs, such as another Word document, an Excel or PowerPoint document to an existing Word document. This article demonstrates how to insert and extract embedded OLE objects in a Word document in Java by using Free Spire.Doc for Java API.   Add dependencies First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file.     <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>    ...

Simple Java Code to Convert Excel to PDF in Java

This article demonstrates a simple solution to convert an Excel file to PDF in Java by using free Excel API – Free Spire.XLS for Java . The following examples illustrate two possibilities to convert Excel to PDF:      Convert the whole Excel file to PDF     Convert a particular Excel Worksheet to PDF Before start with coding, you need to Download Free Spire.XLS for Java package , unzip it and import Spire.Xls.jar file from the lib folder in your project as a denpendency. 1. Convert the whole Excel file to PDF Spire.XLS for Java provides saveToFile method in Workbook class that enables us to easily save a whole Excel file to PDF. import com.spire.xls.FileFormat; import com.spire.xls.Workbook; public class ExcelToPDF {     public static void main(String[] args){         //Create a Workbook         Workbook workbook = new Workbook();   ...