What is Metadata?

Quick Definition

Metadata is data about data—information embedded in a PDF that describes the document itself. PDF metadata includes properties like title, author, subject, keywords, creation date, modification date, and the software used to create the file.

Types of PDF Metadata

PDFs contain two main types of metadata: document information dictionary (the older format) and XMP (Extensible Metadata Platform, the newer XML-based format). Both can coexist in a single PDF, and PDF readers typically display information from both sources.

Common metadata fields include:

  • Title: The document title (not necessarily the file name)
  • Author: The person or organization that created the document
  • Subject: A brief description of the document's content
  • Keywords: Search terms associated with the document
  • Creator: The application used to create the original document (e.g., Microsoft Word)
  • Producer: The software that generated the PDF (e.g., Adobe Acrobat)
  • Creation Date: When the PDF was first created
  • Modification Date: When the PDF was last modified

Why Metadata Matters

Metadata serves multiple purposes. It helps users identify and organize documents, enables search functionality in document management systems, and provides context about a document's origin and history. Operating systems and PDF viewers display metadata in file properties, making it easy to find information about a document without opening it.

However, metadata can also reveal information you may not want to share. A PDF might contain the author's name, company name, file paths from the original computer, or edit history showing when and how the document was modified.

Privacy Concerns

Metadata can inadvertently disclose sensitive information. A confidential document might reveal the author's identity, the organization that created it, or the software and computer used. For publicly distributed PDFs, this information may pose privacy or security risks.

Before sharing PDFs externally, consider removing or sanitizing metadata. PDF editing software can strip metadata or replace it with generic values.

Metadata and PDF Standards

PDF/A requires specific metadata to be present, including the document title. The title must be set to display in the window title bar rather than the file name. This ensures that archived documents have proper identification even if file names change.

PDF/UA (accessible PDFs) also requires title metadata to help assistive technologies identify documents correctly.

XMP Metadata

XMP (Extensible Metadata Platform) is an XML-based metadata format that allows for more complex and extensible metadata than the traditional document information dictionary. XMP can store custom metadata fields, rights management information, and detailed provenance data.

Modern PDF creation software typically writes both traditional metadata and XMP metadata to ensure compatibility with older and newer PDF viewers.

Viewing and Editing Metadata

PDF viewers and editing software provide access to metadata through document properties dialogs. You can view existing metadata, edit fields, or remove metadata entirely. Some tools offer batch metadata editing for processing multiple PDFs at once.

Common Use Cases

  • Document management: Organizing and searching large PDF collections
  • Compliance: Meeting archival standards requiring specific metadata
  • Privacy protection: Removing sensitive information before sharing
  • Accessibility: Providing document titles for screen readers

Related Concepts

  • PDF/A — Archival standard with metadata requirements
  • PDF/UA — Accessibility standard requiring title metadata
  • PDF Corrupted — Metadata corruption issues

Need to manage PDF metadata? Use our PDF tools to view, edit, or remove document properties.