This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
Hello,
this is one of my first visits on Wikipedia, and if my editing experience is very scarce yet, I duly apologize.
OK, now to the discussion: I have recently been editing the article "confusion and diffusion" because I became aware of a statement in the article which simply is not true in my opinion.
Unfortunately my changes have been un-done so I wanted to take a chance to argue in favour of what I think is correct.
The point is that confusion and diffusion are -- at least in some lecture notes and books -- explained just in a self-respective way: confusing.
But if you happen to have read Shannon's original paper, which should be taken as the origin of any citations, the terms are made rather clear:
There are two "inputs" which lead to a ciphertext. One is the plaintext, the other one is the key. Diffusion leads to an elimination of a statistical relation between changes in single plaintext bits and changes in single ciphertext bits. If you change a plaintext bits, a cipher with good diffusion should lead to a 50 pc chance for _any_ ciphertext bit to change.
Confusion, on the other hand, does the same for the key<->ciphertext relation.
As I stated, I did the changes the day before yesterday, but they have been un-done.
Please read the original paper to get the terms correct, and then please re-do my changes. The current explanation is simply incorrect.
A link:
http://www.cs.ucla.edu/~jkong/research/security/shannon1949.pdf
Now Shannon's paper is very lenghty indeed. If you are just interested in the issues of statistical methods reading especially Section 23 ("statistical methods") is sufficient.
Regards
OT
Shannon writes:
Two methods (other than recourse to ideal systems) suggest themselves for frustrating a statistical analysis. These we may call the methods of diffusion and confusion. In the method of diffusion the statistical structure of M which leads to its redundancy is "dissipated" into long range statistics—i.e., into statistical structure involving long combinations of letters in the cryptogram...
The method of confusion is to make the relation between the simple statistics of E and the simple description of K a very complex and involved one.
Schneier writes in Applied Cryptography:
Confusion serves to hide any relationship between the plaintext, the ciphertext and the key. Remember how linear and differential cryptanalysis can exploit even a slight relationship between these three things? Good confusion makes the relationship statistics so complicated that even these powerful cryptanalytic tools won't work.
Diffusion spreads the influence of individual plaintext or key bits over as much of the ciphertext as possible. This also hides statistical relationships and makes cryptanalysis more difficult.
— Matt 08:04, 11 Jun 2004 (UTC)
...and Menezes, van Oorschot, Vanstone write (Handbook of Applied Cryptography):
p.20 1.36: Confusion is intended to make the relationship between the key and ciphertext as complex as possible. Diffusion refers to rearranging or spreading out the bits in the message so that any redundancy in the plaintext is spread out over the ciphertext.
— OT Fri Jun 11 10:29:59 CEST 2004
William Stallings, Cryptography and Network Security (3rd ed. p66-67):
In diffusion, the statistical structure of the plaintext is dissipated into long range statistics of the ciphertext. this is achieved by having each plaintext digit affect the value of many ciphertext digits, which is equivalent to saying that each ciphertext digit is affected by many plaintext digits...
The mechanism of diffusion seeks to make the statistical relationship between the plaintext and ciphertext as complex as possible to thwart attempts to deduce the key. On the other hand, confusion seeks to make the relationship between the statistics of the ciphertext and the value of the encryption key as complex as possible, again to thwart attempts to recover the key. Thus, even if the attacker can get some handle on the statistics of the ciphertext, the way in which the key was used to produce that ciphertext is so complex as to make it difficult to deduce the key.
— Matt 09:04, 11 Jun 2004 (UTC)
OK, I try to offer an explanation for both terms in my own words, and try to bit more precise than in my first uttering:
"Diffusion" is the spreading the redundancy of a plaintext over the ciphertext. What does that mean? In the set of all clear text messages, "sensible" clear text messages have a certain correlation between two-, three- and more-letter combinations (e.g. a "q" is always followd by a "u"). A cipher with good diffusion eliminates this correlation. Changes in one bit or letter affect every bit in the ciphertext. No correlation then means: the chances for a change in any ciphertext bit is 50 per cent.
I think we both agree with this definition.
Now to "confusion": Shannon's view is to have the influence of the key bits on the ciperhtext such that a statistical analysis of the ciphertext does not help to cut out a simply-structured subset which the correct key most probably is situated in. Tying down the key closer and closer shall be practically impossible because the subset is too folded (in a geometric view) inside the set of all keys.
It should therefore be as difficult as possible to infer the key by a known-plaintext attack, for example.
It is generally stated that a complex polyalphabetic substitution rule does just that.
Now to the often-stated relation between diffusion/confusion and transposition/substitution: clearly, transposition adds to the diffusion of the cipher, but only if additional steps are taken. A simple transposition does not spread the influence of a bit change. It simply transposes it.
Similarly, a polyalphabetic cipher alone does not lead to confusion, because a simple key change only leads to a simple ciphertext change (Take Vigenere: if you just change the first key letter of a say ten-letter Vigenere key, then cipher text letters 1,11,21,31 etc are changed, none else.)
So even though transposition is the ingredient for diffusion, and substitution is the ingredient for confusion, it is an iterated combination of both, which really provides both confusion and diffusion.
I'd like to point out that transposition by itself dissipates the higher order n-gram statistics, but not the unigram statistics. User:Dvunkannon —Preceding comment was added at 20:35, 21 November 2007 (UTC)
I think that both of you, User:OT and User:Matt Crypto, are right below in some sense, even though you are disagreeing with each other. The relationship between the key and the ciphertext is complex exactly because the cryptosytem has the property that changing one bit of the key changes the ciphertext completely, and hence in order to change only one bit of the ciphertext you certainly would have to change the key completely. There are basically equivalent properties. I will try to clarify these things here now. --GaborPete (talk) 04:35, 5 April 2009 (UTC)
And I agree with User:Dvunkannon (if it was him at the end): claiming that S-boxes are for confusion and P-boxes are for diffusion is complete nonsense. In the substitution–permutation network article it had been stated the opposite way, which is also complete nonsense, I have just changed that there. Both diffusion and confusion are the results of using S-boxes and P-boxes together in the right way. If you introduce the key with XORs, as usual, then it's impossible to achieve only confusion or only diffusion, whatever your S-boxes and P-boxes are. (This is related to my previous comment, too.) --GaborPete (talk) 04:35, 5 April 2009 (UTC)
There are two more slightly confusing things in the present text, I will change them, too:
Shannon's 1949 paper (Communication Theory of Secrecy Systems) is a revised and declassified version of a 1945 paper (A Mathematical Theory of Cryptography - there's a copy here: https://www.iacr.org/museum/shannon45.html). They're pretty similar. I just checked and Confusion and Diffusion are actually defined in the earlier paper, so I'd like to change the article to reference that earlier piece unless someone has an objection. Shane Lin (talk) 10:08, 26 October 2015 (UTC)
The non-keyed hashes do not have confusion in its standard definition, so I have modified the lead accordingly. A good WP:RS for that assertion is remarkably hard to find, though. Dimawik (talk) 20:55, 25 April 2023 (UTC)
I changed it to: These concepts are also important in the design of cryptographic hash functions, pseudorandom number generators and for the hash functions of hash tables, where decorrelation of the generated values is the main feature.
It was changed to: These concepts are also important in the design of cryptographic hash functions and pseudorandom number generators, where decorrelation of the generated values is the main feature, diffusion is also applicable to non-keyed hash functions.
The Issue: Hash functions of hash tables do have keys. Consider javas HashMap<T,U> that has keys of type T and values of type U. T has a function like hashCode() or something. The same is true also with basically all other modern object oriented languages, for their implementations of hash tables. This function hashCode() takes some fields of the object T and generates a hash value from it. The hash value needs to be decorrelated from the fields it was generated from. It needs to have both properties; Confusion and Diffusion to generate the hash value, so that the hash table can be balanced. I think the usage in hash functions for hash tables is quite common, and should therefore be mentioned. Why was my text removed, and replaced with that of non-keyed hash functions? Thanks · · · Omnissiahs hierophant (talk) 06:09, 26 April 2023 (UTC)
References