MD5 collision for two meaningful documents

Researchers at RUB and the University of Mannheim have a nice demonstration of how the recently discovered attack on the MD5 hash function can be used to fool someone into signing one document when they think it’s another:

Recently, the world of cryptographic hash functions has turned into a mess. A lot of researchers announced algorithms (“attacks”) to find collisions for common hash functions such as MD5 and SHA-1 (see [B+, WFLY, WY, WYY-a, WYY-b]). For cryptographers, these results are exciting – but many so-called “practitioners” turned them down as “practically irrelevant”. The point is that while it is possible to find colliding messages M and M’, these messages appear to be more or less random – or rather, contain a random string of some fixed length (e.g., 1024 bit in the case of MD5). If you cannot exercise control over colliding messages, these collisions are theoretically interesting but harmless, right? In the past few weeks, we have met quite a few people who thought so.

With this page, we want to demonstrate how badly wrong this kind of reasoning is! We hope to provide convincing evidence even for people without much technical or cryptographical background.

Their method is simple and clever. They use the newly discovered attack to generate two random strings that have the same hashed value (say R1 and R2). Then they put those at the start of a “high-level” document description language like PostScript and tack on something along the lines of “if the previous value was R1, print an innocuous message I can get signed, otherwise print the real message I want signed.” A well-known weakness to the MD5 algorithm is that if R1 and R2 have the same hash values then R1+some text will have the same hash value as R2+the same text here, so depending on whether they use R1 or R2 as their preamble they get two very different messages with the same hash value.