A Reminder from the NSA

Thanks to the NSA, we are reminded of the value of metadata.  Metadata is generally defined as data about the data.   In the context of documents, metadata refers to information stored inside the document file to record details about the document, like date-created, date modified, author-name, number of characters, and – in the case of MS-Word documents, details about the text of the document like tracked-changes. 
Here is a screenshot of some of the metadata of the very document you are reading – which I am creating in Microsoft Word.

Metadata cleaning tools exist as a way to remove unwanted metadata (yes - some could be “wanted’) so that you can send a document in MS-Word format without the risk of sending metadata and are common on the legal desktop.

Metadata cleaning tools often provide the option of converting to PDF format as a way to provide a ‘clean’ document.  And this brings me to the topic of this article.   

PDF files also contain metadata.    

Here is a screenshot of the metadata found in the PDF document I created from my MS-Word document.  Notice that some of the metadata in the PDF is carried directly from the MS-Word file to the corresponding metadata field in the PDF.  Examples are Author and Comments.

In an ironic twist, here is an NSA document – dated July 27, 2008, which discusses the risks of metadata in a PDF document.  See Paragraph 4.1 and 4.2 for a good discussion of the risks of metadata in a PDF. 

And here are comments from Adobe about metadata:

Which brings me to metadata scrubbing tools.  These tools have been common in legal practice for the purpose of cleaning outbound MS-Word files.  The latest generation of metadata scrubbing tools now must include PDF document cleaning to provide a truly comprehensive solution. This is especially important since  PDF has become the  web-document format of choice.  

Which brings us to the topic of a future post – mobile document cleaning.

