Skip to main content

How to Extract and Delete PDF Attachments Using Java: A Complete Guide

 

In modern software development and document management, working with PDF files is a common task. Beyond reading text and images, developers often need to handle PDF attachments—extract embedded files, retrieve attachment information, or even delete attachments in bulk.

This guide will walk you through Java PDF attachment extraction and management using practical examples, covering everything from extracting all attachments, handling individual attachments, retrieving attachment metadata, to deleting attachments safely.

The examples use Spire.PDF for Java, but the core ideas can be applied to other Java PDF libraries as well. By the end of this tutorial, you'll be able to efficiently manage PDF attachments with Java.




Why Manage PDF Attachments?

PDF attachments are often critical in enterprise scenarios:

  • Reports and data files: Embedded Excel or Word documents in PDF reports.

  • Contracts and proof documents: Scanned contracts or authorization letters attached as files.

  • Multimedia content: PDFs may include images, audio, or even video attachments.

Common operations developers perform on PDF attachments include:

  1. Extracting attachments: Save embedded files locally for further processing.

  2. Getting attachment info: Retrieve filename, description, creation date, and modification date.

  3. Deleting attachments: Remove attachments in bulk to reduce file size or clear sensitive information.

We’ll cover each of these operations with Java code examples.

Setup and Preparation

Before we start, ensure the following:

  • Include the PDF library

    Using Maven, you can add:

    <repositories>
        <repository>
            <id>com.e-iceblue</id>
            <name>e-iceblue</name>
            <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>e-iceblue</groupId>
            <artifactId>spire.pdf</artifactId>
            <version>12.3.9</version>
        </dependency>
    </dependencies>
    
  • Prepare test PDFs

    Your PDF files should contain one or more attachments. You can add attachments using Adobe Acrobat or other PDF editing tools.

1. Extracting All PDF Attachments

When a PDF contains multiple attachments, extracting all attachments is the most common requirement. Here’s how to do it:

import com.spire.pdf.PdfDocument;
import com.spire.pdf.attachments.*;
import java.io.*;

public class ExtractAllAttachments {
    public static void main(String[] args) throws Exception {
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("data/template_Pdf_2.pdf");

        PdfAttachmentCollection attachments = pdf.getAttachments();

        for (int i = 0; i < attachments.getCount(); i++) {
            PdfAttachment attachment = attachments.get(i);
            String fileName = attachment.getFileName();

            try (BufferedOutputStream output = new BufferedOutputStream(new FileOutputStream(new File("output/" + fileName)))) {
                output.write(attachment.getData());
            }
        }

        pdf.close();
        pdf.dispose();

        System.out.println("All PDF attachments have been successfully extracted to the output folder.");
    }
}

Explanation:

  • PdfDocument: Loads the PDF file.

  • PdfAttachmentCollection: Represents the collection of attachments.

  • BufferedOutputStream: Efficiently writes attachment data to local files.

  • Loop through attachments and save each file to the output/ directory.

This method is ideal for PDFs with multiple attachments.

2. Extracting a Single PDF Attachment

Sometimes, you only need the first attachment or a specific attachment by index:

import com.spire.pdf.PdfDocument;
import com.spire.pdf.attachments.*;
import javax.imageio.stream.FileImageOutputStream;
import java.io.*;

public class ExtractSingleAttachment {
    public static void main(String[] args) throws IOException {
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("data/deleteAllAttachments.pdf");

        PdfAttachmentCollection attachments = pdf.getAttachments();

        if (attachments.getCount() > 0) {
            PdfAttachment attachment = attachments.get(0);
            try (FileImageOutputStream output = new FileImageOutputStream(new File("output/" + attachment.getFileName()))) {
                output.write(attachment.getData(), 0, attachment.getData().length);
            }

            System.out.println("Extracted the first PDF attachment: " + attachment.getFileName());
        } else {
            System.out.println("No attachments found in the PDF.");
        }

        pdf.close();
        pdf.dispose();
    }
}

Tip:

  • You can further filter by attachment.getFileName() if needed.

3. Retrieving PDF Attachment Information

To get PDF attachment info, including filename, description, creation date, and modification date:

import com.spire.pdf.PdfDocument;
import com.spire.pdf.attachments.*;
import java.io.*;

public class GetAttachmentInfo {
    public static void main(String[] args) throws IOException {
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("data/deleteAllAttachments.pdf");

        PdfAttachmentCollection attachments = pdf.getAttachments();

        if (attachments.getCount() > 0) {
            PdfAttachment attachment = attachments.get(0);
            StringBuilder info = new StringBuilder();
            info.append("Filename: ").append(attachment.getFileName()).append("\n");
            info.append("Description: ").append(attachment.getDescription()).append("\n");
            info.append("Creation Date: ").append(attachment.getCreationDate()).append("\n");
            info.append("Modification Date: ").append(attachment.getModificationDate()).append("\n");

            writeStringToTxt(info.toString(), "output/AttachmentInfo.txt");

            System.out.println("Attachment info written to output/AttachmentInfo.txt");
        } else {
            System.out.println("No attachments found in the PDF.");
        }

        pdf.close();
        pdf.dispose();
    }

    private static void writeStringToTxt(String content, String fileName) throws IOException {
        try (FileWriter writer = new FileWriter(fileName, true)) {
            writer.write(content);
        }
    }
}

Notes:

  • getDescription(): Gets attachment description.

  • getCreationDate() and getModificationDate(): Retrieve timestamps.

4. Deleting All PDF Attachments

To delete all PDF attachments in Java, use the following approach:

import com.spire.pdf.PdfDocument;
import com.spire.pdf.attachments.*;

public class DeleteAllAttachments {
    public static void main(String[] args) {
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("data/deleteAllAttachments.pdf");

        PdfAttachmentCollection attachments = pdf.getAttachments();
        attachments.clear(); // Delete all attachments

        pdf.saveToFile("output/deleteAllAttachments.pdf");

        pdf.close();
        pdf.dispose();

        System.out.println("All attachments deleted from PDF and saved to output/deleteAllAttachments.pdf");
    }
}

5. Common Issues and Solutions

Q1: Extracted file is empty

  • Ensure the PDF contains attachments. Use attachment.getData() correctly.

Q2: Filename contains invalid characters

  • Encode or sanitize filenames to prevent errors.

Q3: PDF file size doesn’t reduce after deleting attachments

  • PDFs may contain other redundant objects; consider PDF optimization tools.

Q4: High memory usage during extraction

  • For large attachments, use streaming to avoid loading full content into memory.

6. Best Practices

  1. Always backup PDFs before deletion or batch operations.

  2. Use try-with-resources to automatically close streams.

  3. Organize attachments in folders based on PDF or document type.

  4. Log operations for auditing and debugging.

7. Conclusion

This guide demonstrated how to extract and delete PDF attachments using Java, including:

  • Extracting all attachments (extract all PDF attachments).

  • Handling individual attachments (Java PDF attachment extraction).

  • Retrieving attachment info (get PDF attachment info).

  • Deleting attachments (delete PDF attachments in Java).

Mastering these techniques enables developers to efficiently manage PDF attachments with Java, whether for enterprise reporting, contract management, or automated document workflows.

Comments

Popular posts from this blog

3 Ways to Generate Word Documents from Templates in Java

A template is a document with pre-applied formatting like styles, tabs, line spacing and so on. You can quickly generate a batch of documents with the same structure based on the template. In this article, I am going to show you the different ways to generate Word documents from templates programmatically in Java using Free Spire.Doc for Java library. Prerequisite First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file. <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>...

Insert and Extract OLE objects in Word in Java

You can use OLE (Object Linking and Embedding) to include content from other programs, such as another Word document, an Excel or PowerPoint document to an existing Word document. This article demonstrates how to insert and extract embedded OLE objects in a Word document in Java by using Free Spire.Doc for Java API.   Add dependencies First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file.     <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>    ...

Simple Java Code to Convert Excel to PDF in Java

This article demonstrates a simple solution to convert an Excel file to PDF in Java by using free Excel API – Free Spire.XLS for Java . The following examples illustrate two possibilities to convert Excel to PDF:      Convert the whole Excel file to PDF     Convert a particular Excel Worksheet to PDF Before start with coding, you need to Download Free Spire.XLS for Java package , unzip it and import Spire.Xls.jar file from the lib folder in your project as a denpendency. 1. Convert the whole Excel file to PDF Spire.XLS for Java provides saveToFile method in Workbook class that enables us to easily save a whole Excel file to PDF. import com.spire.xls.FileFormat; import com.spire.xls.Workbook; public class ExcelToPDF {     public static void main(String[] args){         //Create a Workbook         Workbook workbook = new Workbook();   ...