Skip to main content

How to Set PDF Document Properties Using Python

PDF files are very common in modern work environments and are widely used for contracts, reports, eBooks, and various other purposes. When working with PDF files, it is important not only to focus on the document's content but also on its property information. Setting appropriate PDF properties can enhance document management, archiving, and search efficiency. In this article, we will show you how to use Python to set standard and custom PDF document properties.

set pdf document properties in python

1. Introduction to PDF Document Properties

PDF document properties are usually divided into two categories: standard properties and custom properties. Standard properties are the metadata that comes with a PDF file, while custom properties allow users to add personalized data as needed.

1.1 Standard Document Properties

Standard properties are the metadata of a PDF file. Common properties include:

  • Title: The name or description of the document, helping to identify the file content.

  • Author: The creator of the document.

  • Subject: The subject or purpose of the document.

  • Keywords: Used for file retrieval and classification.

  • Creation Date: The date the document was created.

  • Modification Date: The date the document was last modified.

  • Creator: The tool or program that generated the document.

  • Producer: The software used to create the PDF file.

These standard properties help file management systems organize and retrieve documents effectively.

1.2 Custom Document Properties

Custom properties are user-added information, such as:

  • Order number

  • Customer information

  • Project number

  • Document version

Custom properties provide flexible storage for business data, making file management and retrieval easier.

2. Prerequisites

Before writing Python code, make sure you have the following dependencies installed:

  • Python 3.x: You can download and install it from python.org.

  • Spire.PDF: A Python library for manipulating PDF files.

To install the Spire.PDF library, use the following command:

pip install spire.pdf

3. Setting Standard PDF Document Properties Using Python

Next, we'll use the Spire.PDF library to set the standard document properties of a PDF. Assume we have a PDF file and want to modify its title, author, subject, and other basic information.

Example Code:

from spire.pdf import *
from spire.pdf.common import *
from datetime import datetime

# Create a PdfDocument object and load an existing PDF file
pdf = PdfDocument()
pdf.LoadFromFile("example.pdf")

# Get the PDF document's properties object
properties = pdf.DocumentInformation

# Set standard document properties
properties.Author = "Li Hua"
properties.Creator = "PDF Creation Tool"
properties.Keywords = "Annual Report; Company Growth; Finance"
properties.Subject = "2022 Financial Summary Report"
properties.Title = "Company Annual Financial Report 2022"
properties.Producer = "PDF Generator"
properties.CreationDate = datetime.now()
properties.ModDate = datetime.now()

# Save the modified PDF file
pdf.SaveToFile("output/Updated_Standard_Properties.pdf")
pdf.Close()

print("Standard document properties have been set!")

Explanation:

  • Create PdfDocument object: A new PDF object is created using PdfDocument().

  • Load PDF file: The existing PDF file is loaded with LoadFromFile().

  • Get document properties: The metadata object is retrieved using DocumentInformation.

  • Set standard properties: Standard properties like title, author, and keywords are set.

  • Save the file: The modified PDF is saved using SaveToFile().

4. Setting Custom PDF Document Properties Using Python

Now, let's demonstrate how to add custom properties to a PDF. These custom properties can store business-related information like order numbers, customer names, etc.

Example Code:

from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object and load an existing PDF file
pdf = PdfDocument()
pdf.LoadFromFile("example.pdf")

# Get the PDF document's properties object
properties = pdf.DocumentInformation

# Set custom properties
properties.SetCustomProperty("Order Number", "ORD-20230401")
properties.SetCustomProperty("Customer Name", "Zhang Tao")
properties.SetCustomProperty("Delivery Date", "2023-05-01")
properties.SetCustomProperty("Project Manager", "Li Feng")

# Save the modified PDF file
pdf.SaveToFile("output/Updated_Custom_Properties.pdf")
pdf.Close()

print("Custom document properties have been set!")

Explanation:

  • Load PDF file: The existing PDF file is loaded using LoadFromFile().

  • Get document properties: The metadata object is retrieved using DocumentInformation.

  • Set custom properties: Custom business-related properties like order numbers and customer names are set with SetCustomProperty().

  • Save the file: The modified PDF is saved with SaveToFile().

5. Common Usage Scenarios

  • Enterprise Document Management: By setting PDF standard and custom properties, document manageability is enhanced. For example, storing order numbers and customer information makes it easier to retrieve documents later.

  • Batch Processing: If you need to process multiple PDF files, you can dynamically set the standard or custom properties by reading data from a database, improving efficiency.

  • Version Control: Custom properties can be used to record the version number of a document, ensuring proper version management.

  • Document Archiving and Retrieval: Combining custom and standard properties can make archiving and retrieval more efficient. Users can quickly filter documents based on custom properties like order number or customer information.

6. Conclusion

This article introduced how to set both standard and custom PDF document properties using Python. By setting these properties effectively, you can improve document manageability and searchability, enhancing efficiency in real-world applications. Whether for enterprise document management, batch processing, or version control, mastering these techniques can help you better manage and manipulate PDF files.

Comments

Popular posts from this blog

3 Ways to Generate Word Documents from Templates in Java

A template is a document with pre-applied formatting like styles, tabs, line spacing and so on. You can quickly generate a batch of documents with the same structure based on the template. In this article, I am going to show you the different ways to generate Word documents from templates programmatically in Java using Free Spire.Doc for Java library. Prerequisite First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file. <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>...

Insert and Extract OLE objects in Word in Java

You can use OLE (Object Linking and Embedding) to include content from other programs, such as another Word document, an Excel or PowerPoint document to an existing Word document. This article demonstrates how to insert and extract embedded OLE objects in a Word document in Java by using Free Spire.Doc for Java API.   Add dependencies First of all, you need to add needed dependencies for including Free Spire.Doc for Java into your Java project. There are two ways to do that. If you use maven, you need to add the following code to your project’s pom.xml file.     <repositories>               <repository>                   <id>com.e-iceblue</id>                   <name>e-iceblue</name>    ...

Simple Java Code to Convert Excel to PDF in Java

This article demonstrates a simple solution to convert an Excel file to PDF in Java by using free Excel API – Free Spire.XLS for Java . The following examples illustrate two possibilities to convert Excel to PDF:      Convert the whole Excel file to PDF     Convert a particular Excel Worksheet to PDF Before start with coding, you need to Download Free Spire.XLS for Java package , unzip it and import Spire.Xls.jar file from the lib folder in your project as a denpendency. 1. Convert the whole Excel file to PDF Spire.XLS for Java provides saveToFile method in Workbook class that enables us to easily save a whole Excel file to PDF. import com.spire.xls.FileFormat; import com.spire.xls.Workbook; public class ExcelToPDF {     public static void main(String[] args){         //Create a Workbook         Workbook workbook = new Workbook();   ...