Stripping metadata from PDF files
Posted on November 2, 2021
Sometimes, for example when sending a review of a paper, I do not want the pdf file to contain any metadata. Ideally, the editorial process should take care of this, but I do not want to take any chances. In this post1, I explain a simple method to strip metadata from PDF files.
First, lets see what metadata is generated by a simple ConTeXt file.
$exiftool --all input.pdf
ExifTool Version Number : 12.30
File Name : input.pdf
Directory : .
File Size : 13 KiB
File Modification Date/Time : 2021:11:02 00:56:42-04:00
File Access Date/Time : 2021:10:24 02:49:40-04:00
File Inode Change Date/Time : 2021:11:02 00:56:42-04:00
File Permissions : -rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
Linearized : No
Con Te Xt Jobname : input
Con Te Xt Support : contextgarden.net
Con Te Xt Time : 2021-11-02 00:56
Con Te Xt Url : www.pragma-ade.com
Con Te Xt Version : 2021.09.17 10:01
Create Date : 2021:11:02 00:56:42-04:00
ID : input | 2021-11-02T00:56:42-04:00
Modify Date : 2021:11:02 00:56:42-04:00
Te X Support : tug.org
Language : en
Format : application/pdf
Creator :
Description :
Title : input
Id : input | 2021-11-02T00:56:42-04:00
Con Te Xt Jobname : input
Con Te Xt Time : 2021:11:02 00:56
Con Te Xt Url : www.pragma-ade.com
Con Te Xt Support : contextgarden.net
Con Te Xt Version : 2021.09.17 10:01
Con Te Xt LMTX :
Te X Support : tug.org
Lua Te X Version : 2.09
Lua Te X Functionality : 20210914
Lua Te X Lua Version : 5.4
Lua Te X Platform : linux-64
Creator Tool : LuaMetaTeX 2.09 20210914 + ConTeXt LMTX 2021.09.17 10:01
Metadata Date : 2021:11:02 00:56:42-04:00
Keywords :
Producer : LuaMetaTeX-2.09
Trapped : False
Document ID : uuid:76dc9aee-4451-9b7c-3d0c-785458e06945
Instance ID : uuid:20f757a8-44a5-938b-d99a-56b431f3b794
Page Mode : UseNone
Page Count : 2
PDF Version : 1.7
The file literarily contains a “Made by ConTeXt” badge. Given the number of ConTeXt users, this might be more than enough to identify me in my research community. I do not want this information in the pdf file.
Fortunately, stripping this information is easy using [qpdf][qpdf]
$ qpdf --empty --pages input.pdf -- output.pdf
where input.pdf
is the name of the input file and output.pdf
is the
desired name of the output file. We check the metadata again:
$exiftool --all output.pdf
ExifTool Version Number : 12.30
File Name : output.pdf
Directory : .
File Size : 12 KiB
File Modification Date/Time : 2021:11:02 01:01:17-04:00
File Access Date/Time : 2021:11:02 01:01:23-04:00
File Inode Change Date/Time : 2021:11:02 01:01:17-04:00
File Permissions : -rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.3
Linearized : No
Page Count : 2
Ah! Now there are no hints about the producer in the metadata.
-
This is an updated version of an old blog post ↩︎
This entry was posted in CLI and tagged pdf, metadata, qpdf, exifinfo.