Skip to main content

How to Convert PDF to Excel Using Java: Basic Conversion and Advanced Settings

 PDF files are widely used in various industries, especially for reports, contracts, invoices, and other structured documents. When there is a need to extract data from a PDF file for further analysis, converting PDF to Excel becomes a common requirement. This article will demonstrate how to convert a PDF file to Excel using Java, along with custom settings to optimize the conversion of complex PDFs.

convert pdf to excel in java


1. Introduction to PDF to Excel Conversion

PDF files are typically used for presenting content, but sometimes we need to convert them to Excel format for better data processing and analysis. By converting PDF to Excel, users can easily extract tabular data and automate processing. Using Java, this conversion can be easily achieved with just a few API calls.

2. Prerequisites

Before starting the coding process, make sure you have the following dependencies installed:

  • Java Development Kit (JDK 1.8 or later): Download and install it from Oracle's official website.

  • Spire.PDF for Java: A powerful library for working with PDF files that allows you to easily convert PDFs to Excel or other formats.

Installing Spire.PDF for Java:

If you're using Maven, you can add the following dependency to your pom.xml file:

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>12.2.1</version>
    </dependency>
</dependencies>

If you are not using Maven, you can manually download the Spire.PDF for Java library and import the JAR file into your project.

3. Basic PDF to Excel Example

Let’s start with a simple example that shows how to directly convert a PDF file to Excel.

Example Code - Basic PDF to Excel:

import com.spire.pdf.*;

public class PDFtoExcel {
    public static void main(String[] args) {
        // Create a PdfDocument instance
        PdfDocument pdf = new PdfDocument();

        // Load the PDF document
        pdf.loadFromFile("test.pdf");

        // Save as Excel file
        pdf.saveToFile("PDFToXLS.xlsx", FileFormat.XLSX);

        // Close the document
        pdf.close();
        
        System.out.println("PDF to Excel conversion completed!");
    }
}

Explanation:

  • Create PdfDocument Object: An instance of the PDF document is created using PdfDocument().

  • Load PDF File: The existing PDF file is loaded using loadFromFile().

  • Save as Excel: The PDF is converted and saved as an Excel file (.xlsx) using saveToFile().

  • Close Document: The PDF file is closed with close().

4. Advanced PDF to Excel Conversion Settings

For PDFs containing complex tables and layouts, the default conversion might not fully preserve the original format. By adjusting the conversion settings, you can optimize the output. For example, you can control whether each page is converted into a separate worksheet, whether rotated text is preserved, and whether multi-line cells are split.

Available Options:

Option

Description

Default Value

convertToMultipleSheet

Whether to convert each page of the PDF into a separate Excel worksheet.

True

rotatedText

Whether to preserve rotated text in the PDF. When enabled, text direction in Excel will match the original PDF.

True

splitCell

Controls whether cells containing multi-line text in the PDF should be split into multiple rows in Excel.

True

wrapText

Whether to enable text wrapping in Excel cells for long text.

True

overlapText

Whether to preserve overlapping text effects in the PDF. If enabled, overlapping text will be rendered similarly in Excel.

False

Example Code - Advanced PDF to Excel Conversion Settings:

import com.spire.pdf.*;
import com.spire.pdf.fileformats.*;

public class PDFtoExcelAdvanced {
    public static void main(String[] args) {
        // Create a PdfDocument object
        PdfDocument pdf = new PdfDocument();

        // Load the PDF file
        pdf.loadFromFile("Sample.pdf");

        // Create custom conversion settings
        XlsxLineLayoutOptions layoutOptions = new XlsxLineLayoutOptions();
        
        // Set layout options
        layoutOptions.setConvertToMultipleSheet(true);  // Convert each page as a worksheet
        layoutOptions.setRotatedText(true);  // Preserve rotated text
        layoutOptions.setSplitCell(false);  // Do not split multi-line cells
        layoutOptions.setWrapText(true);  // Enable text wrapping in cells
        layoutOptions.setOverlapText(false);  // Do not preserve overlapping text

        // Apply the layout options
        pdf.getConvertOptions().setPdfToXlsxOptions(layoutOptions);

        // Convert and save as Excel file
        pdf.saveToFile("advanced_output.xlsx", FileFormat.XLSX);

        // Close the document
        pdf.close();

        System.out.println("Custom settings conversion of PDF to Excel completed!");
    }
}

Explanation:

  • Create XlsxLineLayoutOptions Object: This class is used to set custom conversion options, such as whether each page should be a separate worksheet, whether rotated text is preserved, etc.

  • Set Conversion Options: Set the layout options to ensure the Excel file retains the original PDF formatting as much as possible.

  • Apply Settings: The custom options are applied to the conversion process using getConvertOptions().setPdfToXlsxOptions(layoutOptions).

  • Save as Excel: The converted PDF is saved as an Excel file using saveToFile().

5. Use Cases

  • Financial Report Conversion: Convert PDF-formatted financial reports to Excel for data analysis and processing.

  • Contract Management: Convert contracts containing tables and complex layouts to Excel for easy extraction and management of data.

  • Batch Processing: When handling large volumes of PDF files, batch apply these custom conversion settings to improve efficiency.

6. Conclusion

This article introduced how to convert PDF files to Excel using Java, along with how to optimize the conversion results using custom settings. By selecting the appropriate options, you can ensure that the conversion retains as much of the original layout and formatting as possible. Whether dealing with simple conversions or complex files with multiple pages, rotated text, or overlapping elements, applying these settings can help you efficiently complete the PDF to Excel conversion task.

Comments

Popular posts from this blog

3 Ways to Generate Word Documents from Templates in Java

A template is a document with pre-applied formatting like styles, tabs, line spacing and so on. You can quickly generate a batch of documents with the same structure based on the template. In this article, I am going to show you the different ways to generate Word documents from templates programmatically in Java using Free Spire.Doc for Java library. Prerequisite First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file. <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>...

Insert and Extract OLE objects in Word in Java

You can use OLE (Object Linking and Embedding) to include content from other programs, such as another Word document, an Excel or PowerPoint document to an existing Word document. This article demonstrates how to insert and extract embedded OLE objects in a Word document in Java by using Free Spire.Doc for Java API.   Add dependencies First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file.     <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>    ...

Simple Java Code to Convert Excel to PDF in Java

This article demonstrates a simple solution to convert an Excel file to PDF in Java by using free Excel API – Free Spire.XLS for Java . The following examples illustrate two possibilities to convert Excel to PDF:      Convert the whole Excel file to PDF     Convert a particular Excel Worksheet to PDF Before start with coding, you need to Download Free Spire.XLS for Java package , unzip it and import Spire.Xls.jar file from the lib folder in your project as a denpendency. 1. Convert the whole Excel file to PDF Spire.XLS for Java provides saveToFile method in Workbook class that enables us to easily save a whole Excel file to PDF. import com.spire.xls.FileFormat; import com.spire.xls.Workbook; public class ExcelToPDF {     public static void main(String[] args){         //Create a Workbook         Workbook workbook = new Workbook();   ...