Stripping metadata from PDF files

Posted on November 2, 2021

Sometimes, for example when sending a review of a paper, I do not want the pdf file to contain any metadata. Ideally, the editorial process should take care of this, but I do not want to take any chances. In this post1, I explain a simple method to strip metadata from PDF files.

First, lets see what metadata is generated by a simple ConTeXt file.

$exiftool --all input.pdf
ExifTool Version Number         : 12.30
File Name                       : input.pdf
Directory                       : .
File Size                       : 13 KiB
File Modification Date/Time     : 2021:11:02 00:56:42-04:00
File Access Date/Time           : 2021:10:24 02:49:40-04:00
File Inode Change Date/Time     : 2021:11:02 00:56:42-04:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Con Te Xt Jobname               : input
Con Te Xt Support               : contextgarden.net
Con Te Xt Time                  : 2021-11-02 00:56
Con Te Xt Url                   : www.pragma-ade.com
Con Te Xt Version               : 2021.09.17 10:01
Create Date                     : 2021:11:02 00:56:42-04:00
ID                              : input | 2021-11-02T00:56:42-04:00
Modify Date                     : 2021:11:02 00:56:42-04:00
Te X Support                    : tug.org
Language                        : en
Format                          : application/pdf
Creator                         : 
Description                     : 
Title                           : input
Id                              : input | 2021-11-02T00:56:42-04:00
Con Te Xt Jobname               : input
Con Te Xt Time                  : 2021:11:02 00:56
Con Te Xt Url                   : www.pragma-ade.com
Con Te Xt Support               : contextgarden.net
Con Te Xt Version               : 2021.09.17 10:01
Con Te Xt LMTX                  : 
Te X Support                    : tug.org
Lua Te X Version                : 2.09
Lua Te X Functionality          : 20210914
Lua Te X Lua Version            : 5.4
Lua Te X Platform               : linux-64
Creator Tool                    : LuaMetaTeX 2.09 20210914 + ConTeXt LMTX 2021.09.17 10:01
Metadata Date                   : 2021:11:02 00:56:42-04:00
Keywords                        : 
Producer                        : LuaMetaTeX-2.09
Trapped                         : False
Document ID                     : uuid:76dc9aee-4451-9b7c-3d0c-785458e06945
Instance ID                     : uuid:20f757a8-44a5-938b-d99a-56b431f3b794
Page Mode                       : UseNone
Page Count                      : 2
PDF Version                     : 1.7

The file literarily contains a “Made by ConTeXt” badge. Given the number of ConTeXt users, this might be more than enough to identify me in my research community. I do not want this information in the pdf file.

Fortunately, stripping this information is easy using [qpdf][qpdf]

$ qpdf --empty --pages input.pdf -- output.pdf

where input.pdf is the name of the input file and output.pdf is the desired name of the output file. We check the metadata again:

$exiftool --all output.pdf

ExifTool Version Number         : 12.30
File Name                       : output.pdf
Directory                       : .
File Size                       : 12 KiB
File Modification Date/Time     : 2021:11:02 01:01:17-04:00
File Access Date/Time           : 2021:11:02 01:01:23-04:00
File Inode Change Date/Time     : 2021:11:02 01:01:17-04:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
Page Count                      : 2

Ah! Now there are no hints about the producer in the metadata.


  1. This is an updated version of an old blog post ↩︎


This entry was posted in CLI and tagged pdf, metadata, qpdf, exifinfo.