PDF files are very common in modern work environments and are widely used for contracts, reports, eBooks, and various other purposes. When working with PDF files, it is important not only to focus on the document's content but also on its property information. Setting appropriate PDF properties can enhance document management, archiving, and search efficiency. In this article, we will show you how to use Python to set standard and custom PDF document properties.
1. Introduction to PDF Document Properties
PDF document properties are usually divided into two categories: standard properties and custom properties. Standard properties are the metadata that comes with a PDF file, while custom properties allow users to add personalized data as needed.
1.1 Standard Document Properties
Standard properties are the metadata of a PDF file. Common properties include:
Title: The name or description of the document, helping to identify the file content.
Author: The creator of the document.
Subject: The subject or purpose of the document.
Keywords: Used for file retrieval and classification.
Creation Date: The date the document was created.
Modification Date: The date the document was last modified.
Creator: The tool or program that generated the document.
Producer: The software used to create the PDF file.
These standard properties help file management systems organize and retrieve documents effectively.
1.2 Custom Document Properties
Custom properties are user-added information, such as:
Order number
Customer information
Project number
Document version
Custom properties provide flexible storage for business data, making file management and retrieval easier.
2. Prerequisites
Before writing Python code, make sure you have the following dependencies installed:
Python 3.x: You can download and install it from python.org.
Spire.PDF: A Python library for manipulating PDF files.
To install the Spire.PDF library, use the following command:
pip install spire.pdf3. Setting Standard PDF Document Properties Using Python
Next, we'll use the Spire.PDF library to set the standard document properties of a PDF. Assume we have a PDF file and want to modify its title, author, subject, and other basic information.
Example Code:
from spire.pdf import *
from spire.pdf.common import *
from datetime import datetime
# Create a PdfDocument object and load an existing PDF file
pdf = PdfDocument()
pdf.LoadFromFile("example.pdf")
# Get the PDF document's properties object
properties = pdf.DocumentInformation
# Set standard document properties
properties.Author = "Li Hua"
properties.Creator = "PDF Creation Tool"
properties.Keywords = "Annual Report; Company Growth; Finance"
properties.Subject = "2022 Financial Summary Report"
properties.Title = "Company Annual Financial Report 2022"
properties.Producer = "PDF Generator"
properties.CreationDate = datetime.now()
properties.ModDate = datetime.now()
# Save the modified PDF file
pdf.SaveToFile("output/Updated_Standard_Properties.pdf")
pdf.Close()
print("Standard document properties have been set!")Explanation:
Create PdfDocument object: A new PDF object is created using
PdfDocument().Load PDF file: The existing PDF file is loaded with
LoadFromFile().Get document properties: The metadata object is retrieved using
DocumentInformation.Set standard properties: Standard properties like title, author, and keywords are set.
Save the file: The modified PDF is saved using
SaveToFile().
4. Setting Custom PDF Document Properties Using Python
Now, let's demonstrate how to add custom properties to a PDF. These custom properties can store business-related information like order numbers, customer names, etc.
Example Code:
from spire.pdf import *
from spire.pdf.common import *
# Create a PdfDocument object and load an existing PDF file
pdf = PdfDocument()
pdf.LoadFromFile("example.pdf")
# Get the PDF document's properties object
properties = pdf.DocumentInformation
# Set custom properties
properties.SetCustomProperty("Order Number", "ORD-20230401")
properties.SetCustomProperty("Customer Name", "Zhang Tao")
properties.SetCustomProperty("Delivery Date", "2023-05-01")
properties.SetCustomProperty("Project Manager", "Li Feng")
# Save the modified PDF file
pdf.SaveToFile("output/Updated_Custom_Properties.pdf")
pdf.Close()
print("Custom document properties have been set!")Explanation:
Load PDF file: The existing PDF file is loaded using
LoadFromFile().Get document properties: The metadata object is retrieved using
DocumentInformation.Set custom properties: Custom business-related properties like order numbers and customer names are set with
SetCustomProperty().Save the file: The modified PDF is saved with
SaveToFile().
5. Common Usage Scenarios
Enterprise Document Management: By setting PDF standard and custom properties, document manageability is enhanced. For example, storing order numbers and customer information makes it easier to retrieve documents later.
Batch Processing: If you need to process multiple PDF files, you can dynamically set the standard or custom properties by reading data from a database, improving efficiency.
Version Control: Custom properties can be used to record the version number of a document, ensuring proper version management.
Document Archiving and Retrieval: Combining custom and standard properties can make archiving and retrieval more efficient. Users can quickly filter documents based on custom properties like order number or customer information.
6. Conclusion
This article introduced how to set both standard and custom PDF document properties using Python. By setting these properties effectively, you can improve document manageability and searchability, enhancing efficiency in real-world applications. Whether for enterprise document management, batch processing, or version control, mastering these techniques can help you better manage and manipulate PDF files.

Comments
Post a Comment