June 10, 2005

MD5 collision for two meaningful documents

Researchers at RUB and the University of Mannheim have a nice demonstration of how the recently discovered attack on the MD5 hash function can be used to fool someone into signing one document when they think it's another:

Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms ("attacks") to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting - but many so-called "practitioners" turned them down as "practically irrelevant". The point is that while it is possible to find colliding messages M and M', these messages appear to be more or less random - or rather, contain a random string of some fixed length (e.g., 1024 bit in the case of MD5). If you cannot exercise control over colliding messages, these collisions are theoretically interesting but harmless, right? In the past few weeks, we have met quite a few people who thought so.

With this page, we want to demonstrate how badly wrong this kind of reasoning is! We hope to provide convincing evidence even for people without much technical or cryptographical background.

Their method is simple and clever. They use the newly discovered attack to generate two random strings that have the same hashed value (say R1 and R2). Then they put those at the start of a "high-level" document description language like PostScript and tack on something along the lines of "if the previous value was R1, print an innocuous message I can get signed, otherwise print the real message I want signed." A well-known weakness to the MD5 algorithm is that if R1 and R2 have the same hash values then R1+some text will have the same hash value as R2+the same text here, so depending on whether they use R1 or R2 as their preamble they get two very different messages with the same hash value.

Posted by bug to Security at June 10, 2005 4:07 PM | TrackBack
Comments

It seems to me that there is another big problem here: Postscript is a Turing-complete programming language. I don't know the details of the Postscript runtime, but it doesn't seem unreasonable to be able to create a document that would look different every time you display it if there is any non-determinism in the Postscript interpreter.

Posted by: Rawhide at June 10, 2005 7:47 PM

I agree PostScript is dangerous regardless. This kind of attack is more general though, since it could work even if you had the exact same interpreter in both cases and even if the language was completely deterministic (i.e. didn't include references to outside state or random numbers).

In fact, I'll bet it could be done with a far less high-level document description. For example, I bet with a bit of cleverness you could create a pair of image files that use the same-hash string pairs to either indicate the start of a comment field (which gets ignored) or a graphics block. That'd give you the same effect as what these guys did with PostScript.

Posted by: bug at June 12, 2005 12:33 PM
Post a comment












Anonymous posting is allowed, as are these HTML tags — a href, b, br, p, strong, em, ul, li, blockquote.
Email addresses are spam-protected.

You must have Javascript enabled to comment, due to the code I'm using to try to outwit spammers. Sorry for any inconvenience this may cause.

Remember Me