Danger Lurks in PDF Documents

At the 27th Chaos Communication Congress (27C3) in Berlin, security researcher Julia Wolf of US company FireEye pointed out numerous, previously hardly known, security problems in connection with Adobe’s PDF standard. For instance, a PDF can reportedly contain a database scanner that becomes active and scans a network when the document is printed on a network printer. Wolf said that the document format is also full of other surprises. For example, it is reportedly possible to write PDFs which display different content in different operating systems, browsers or PDF readers – or even depending on a computer’s language settings.

Many businesses and authorities use PDF as their standard file format for maintaining presentation consistency across heterogeneous computer environments. According to Wolf, however, the PDF standard has long had too many functions that can be exploited to launch attacks and wreak other havoc. These functions range from database connections without security features to options that can blindly trigger the execution of arbitrary programs in Acrobat Reader. The researcher said that other risks are generated through the support of inherently insecure script languages such as JavaScript, formats such as XML, RFID tags and digital rights management (DRM) technologies. According to Wolf, Adobe itself calls PDF a “container format” which may indeed hold a variety of things. For example, it is possible to integrate Flash files, which themselves offer many points of attack, as well as audio and video files.

Wolf said that there are generally many places for hiding arbitrary data and code in a PDF. The researcher explained that, for instance, all document and meta data can be read and edited via JavaScript. Even files compressed in formats such as ZIP, which allow further arbitrary objects to be embedded via comments, can reportedly be integrated. Wolf added that it is also possible to generate very small PDF files which only execute JavaScript, and that certain objects can be referenced multiple times to trigger different responses when opening a file.

In the researcher’s experience, the security debacle is made worse because most anti-virus programs are incapable of detecting malicious software in PDFs. When running tests with various known exploits, Wolf said that more than half of the 40 scanners she tested didn’t respond, even in cases where the corresponding advisories were several months old. When malicious code in JavaScript is compressed, the detection rate is apparently even lower.

Update – Adobe see the sandbox introduced with Reader X (Reader version 10.0) as the remedy for these problems, which allows code to be executed separately in ‘protected mode’.

Other security experts recommend using special tools to remove meta data from PDFs or check the file syntax for conformity issues beforehand.