Rendered at 21:39:10 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
ajb 8 hours ago [-]
I would argue that they are not the same, but there is a symmetry between them.
The central problem of cryptology is to prevent inference about either the key or the plaintext, despite the requirement to be able to reconstruct the plaintext from the ciphertext+key. So ciphers have to almost perfectly mix information.
Machine learning is possible because in the absence of perfect mixing, inference is possible (given many input output pairs), even if the information is many decibels down below the noise. So the information about what parameters need changing is present in the output despite many subsequent layers of processing. This means that a lot of mixing can be tolerated, and it's needed because you don't know in advance what the data flow should look like in detail, so the NN has to provide as many options as possible.
ogogmad 5 hours ago [-]
ChaCha20 got discovered using a computer search testing out resistance to certain attacks. Hence, the architecture came first and then the parameters came next. Any link with NN gradient descent? It would likely be an abstract one.
tptacek 5 hours ago [-]
I don't know how true this is? Salsa20 seems like pretty standard ARX design that builds a hash function in counter mode; there's a detailed paper explaining Bernstein's decisions.
gobdovan 6 hours ago [-]
I think the underlying explanation is that both fields deal with very large state spaces, so the forms converge somewhat.
I think the contrast is more interesting: exact discrete trajectories in cryptography versus approximate continuous function approximation in neural networks.
In cryptography, you usually want a state space so large that nobody can accidentally find, reconstruct, or predict the same path you took.
In neural networks, you want an immense initial search space because NNs need to model the real world, which is highly complex and contains patterns that appear unpredictably. One aspect I think is often overlooked is that NNs are mostly deletive: they start with a very broad representational space and become progressively more specific by discarding what the NN perceives as irrelevant distinctions.
I think this puts the article's point about complexity and mixing in a clearer light. The same class of procedures achieves almost opposite effects. In neural networks, you want mixing so the model can approximate many possible paths at once. In cryptography, you want mixing so the path taken is unpredictable and hard to trace. The key difference is that, for NNs, an approximate path can be good enough. In cryptography, an approximate path is as useless as a very distant one.
bux93 8 hours ago [-]
Because both of them are optimized for hardware. Neural networks, despite the name, have very little similarity to biologics.
There's a lot of multiplication of numbers in parallel, so it makes sense to try to fit that to matrices.
Cryptography is built bottom-up, but likewise it makes sense to exploit data structures that already exist in silicon.
PxldLtd 7 hours ago [-]
While modern LLMs are a far cry from biological synapses, I do find it fascinating that if you take the highly reciprocal data of a biological connectome and unroll it into a DAG, you suddenly see motifs popping up that look similar to what we find in AI. I found this both looking at temporal unrolling of RNNs or mapping layer activation weights of a Transformer. Totally agree though, the current LLM architecture itself is driven by the need to shove all of this nicely into parallelized compute hardware.
krisoft 59 minutes ago [-]
> if you take the highly reciprocal data of a biological connectome and unroll it into a DAG, you suddenly see motifs popping up that look similar to what we find in AI
That sounds interesting. Where have you heard about that? Or is this your own research?
amoss 8 hours ago [-]
In addition both have a property similar to dispersion. In crypto each change to an input bit should cascade through as many output bits as possible. In ML each output bit should depend on as much of the input bits (and hidden layers) as possible. So they both feature a similar maximization of entropy.
winfieldchen 3 hours ago [-]
> dispersion [...] maximization of entropy
This is exactly the point. I was disappointed that I had to scroll so far down the page until I saw the word "entropy." There is a deep connection between machine learning and encryption and compression in information theory. As Shannon demonstrated, the one-time pad's encrypted output is maximum entropy, and so would data compressed to the Shannon limit. Such an optimal compressor learns the underlying probability distribution of the data to represent it with the fewest bits possible, which is exactly the goal of machine learning. A trained ML model can be seen as a lossy compression of the training data. Autoencoding models make the link between ML and compression (and thus encryption) explicit.
tryauuum 9 hours ago [-]
Can anyone recommend any good content to learn cryptography? Like, even if I read the algorithm for AES I have zero understanding about why it works this way
I've finished the Cryptography I on Coursera already. Can't recommend it enough
coldstartops 9 hours ago [-]
I've been through Introduction to Modern Cryptography by Katz and Lindell.
Can recommend, as it starts with Caesars cipher, one time pads, and builds towards modern cryptography.
pet_the_bird 9 hours ago [-]
"Cryptography Made Simple" By Nigel Smart and "A Graduate Course in
Applied Cryptography" by Dan Boneh and Victor Shoup are excellent resources for people that have affinity with Math and CS. The second resource can be a tough read, and I would strongly recommend not skipping the first few chapters.
yason 8 hours ago [-]
Back in the day, I read Applied Cryptography (by Schneier) and clarity rained upon many things.
tptacek 7 hours ago [-]
More damage has been done by that book than by any Herbert Schildt C language book.
LPisGood 6 hours ago [-]
Can you elaborate?
tptacek 5 hours ago [-]
It's a book that is much more interested in presenting an almanac-esque survey of everything that was happening in cryptography at the time it was written (also unhelpful: it was written at a particularly un-rigorous point in the evolution of cryptography) than it is in teaching readers how to accomplish anything safely.
esafak 6 hours ago [-]
This is news to me. Is it him in general or just that book?
tptacek 6 hours ago [-]
Just that book. The followup (Practical Cryptography, now called Cryptography Engineering, though it's the same book) is much, much better --- though it's also totally out of date at this point, and would also get you in trouble.
zOneLetter 8 hours ago [-]
I looked at the recommendations under your comment, but I don't think I'm capable of these either lol
Any recommendations for a technically competent person, but for someone with math knowledge trailing off at Calc 2?
krupan 8 hours ago [-]
The math isn't that difficult once you grok mod math. It's like time, like doing addition and subtraction on a clock. What's 10 + 4 on a clock? 4 hours past 10 is 2.
tptacek 6 hours ago [-]
The math stays difficult after basic discrete concepts and gets more difficult as you go. :)
It's straightforward to get yourself to a place where you can do cryptographic things and feel somewhat comfortable with what's happening. Truly understanding it to the point where you can reason safely about it is deceptively harder.
pet_the_bird 5 hours ago [-]
[dead]
PaulStatezny 7 hours ago [-]
I would highly recommend the free book Crypto 101.
1) Understanding Cryptography by Christof Paar et al. I learnt cryptography from the 1st edition. Its very practical and highly recommended - https://www.cryptography-textbook.com/
3) For understanding how cryptography is used in Networks see the classic Network Security: Private Communication in a Public World by Radia Perlman et al. The 2nd edition is where i started my journey into network security/cryptography needed for my then job. Highly recommended - https://www.amazon.com/Network-Security-Charlie-Kaufman/dp/0...
The first two books give you the "mechanisms" (and theory) of cryptography i.e. the building blocks. The last book puts everything together to implement "policies" via practical applications (eg. IPSec/SSL etc.) for the real world. They are complementary and hence should be studied together to get the full picture.
A large part of this book is aimed at the readers who want to know why
we designed Rijndael in the way we did. For them, we explain the ideas and
principles underlying the design of Rijndael, culminating in our wide trail
design strategy.
soupspaces 7 hours ago [-]
[dead]
jdw64 8 hours ago [-]
Ecologically speaking, there is a term called “carcinization”
the evolutionary tendency for different organisms to independently evolve crablike forms.
The condition for carcinization is usually described as a kind of “shared condition.”
After reading this article, that is what I felt.
In other words, from the perspective of shared conditions, isn’t it possible that systems receive similar pressures when they need to mix information?
1. There is a state space.
2. Each part of the input affects many parts of the output.
3. A simple rule is not enough, so nonlinearity becomes necessary.
4. But the hardware cannot be allowed to stall, so the system evolves toward a structure where simple transformations are repeated many times.
Ultimately, even across different fields, the core question is how to decompose complexity into atomic units. The choice of those units tends to converge under the pressures imposed by the underlying substrate. This seems to be the central thesis of the article.
This feels similar to how humans solve nonlinear differential equations.
If so, perhaps the structure of human cognition itself works in a similar way: when facing nonlinearity, we break it into smaller structures and design around those smaller parts.
Because my academic background is limited, I find it difficult to express this properly in language. But I think this kind of pressure can also be applied to programming and software theory.
When I think about software engineering, it also often starts from the smallest element that does not change easily, and then builds larger systems by composing those elements. In OOP, that unit is the conceptual object. In FP, it is the function. In DOP, it is data.
FP is mathematical.
DOP is aligned with the data that computers store and transmit.
OOP is connected to our abstract model of the world.
That may be why different people are good at different paradigms.
OOP compresses the world into objects and responsibilities.
FP compresses the world into functions and composition.
DOP compresses the world into data and transformable structures.
Utlimately, it is a question of how we cut complexity, what we choose as the minimal unit of decomposition.
Then what should this idea be called?
And if we apply this to AI coding, what would it imply?
I have thoughts, but because I did not study enough, I feel frustrated that I cannot express them more fully.
I wish I had learned more.
mftb 4 hours ago [-]
This is a good and useful breakdown. There are lots of ways to get an education and continue learning.
jdw64 3 hours ago [-]
Thank you. My goal is also to work hard, earn money, and eventually go to graduate school.
adampunk 56 minutes ago [-]
There are parts of this that I consider a reach but the whole thing—despite that—feels sensible and looks a useful way to hang some things together which are normally separated.
These are well-expressed thoughts on a really hard subject.
tptacek 6 hours ago [-]
I called this out to Thomas Pornin a few months ago (that the forward pass of a neural network and a block cipher rhymed in the sense of being an iterated complex linear function punctuated by a nonlinear function that keeps the whole system from collapsing into linearity) and he intimated that it was not the profound or useful observation that I thought it to be. I feel somewhat vindicated.
Ironic that both Shannon and Turing layed the foundation for both cryptography and AI. I think it boils down to information which is related to language and text.
soupspaces 7 hours ago [-]
[dead]
postalrat 6 hours ago [-]
Maybe its because they are practical solutions for a branch of mathematics we haven't been able to solve.
ghstinda 7 hours ago [-]
are they really? seems not accurate to me, the devil is in the details
unnouinceput 5 hours ago [-]
Of course the devil is in details. This entire article read like an alien visiting Earth and concluding "Humans and Dogs have 84% DNA in common, no wonder they make such a good pair."
The central problem of cryptology is to prevent inference about either the key or the plaintext, despite the requirement to be able to reconstruct the plaintext from the ciphertext+key. So ciphers have to almost perfectly mix information.
Machine learning is possible because in the absence of perfect mixing, inference is possible (given many input output pairs), even if the information is many decibels down below the noise. So the information about what parameters need changing is present in the output despite many subsequent layers of processing. This means that a lot of mixing can be tolerated, and it's needed because you don't know in advance what the data flow should look like in detail, so the NN has to provide as many options as possible.
I think the contrast is more interesting: exact discrete trajectories in cryptography versus approximate continuous function approximation in neural networks.
In cryptography, you usually want a state space so large that nobody can accidentally find, reconstruct, or predict the same path you took.
In neural networks, you want an immense initial search space because NNs need to model the real world, which is highly complex and contains patterns that appear unpredictably. One aspect I think is often overlooked is that NNs are mostly deletive: they start with a very broad representational space and become progressively more specific by discarding what the NN perceives as irrelevant distinctions.
I think this puts the article's point about complexity and mixing in a clearer light. The same class of procedures achieves almost opposite effects. In neural networks, you want mixing so the model can approximate many possible paths at once. In cryptography, you want mixing so the path taken is unpredictable and hard to trace. The key difference is that, for NNs, an approximate path can be good enough. In cryptography, an approximate path is as useless as a very distant one.
There's a lot of multiplication of numbers in parallel, so it makes sense to try to fit that to matrices.
Cryptography is built bottom-up, but likewise it makes sense to exploit data structures that already exist in silicon.
That sounds interesting. Where have you heard about that? Or is this your own research?
This is exactly the point. I was disappointed that I had to scroll so far down the page until I saw the word "entropy." There is a deep connection between machine learning and encryption and compression in information theory. As Shannon demonstrated, the one-time pad's encrypted output is maximum entropy, and so would data compressed to the Shannon limit. Such an optimal compressor learns the underlying probability distribution of the data to represent it with the fewest bits possible, which is exactly the goal of machine learning. A trained ML model can be seen as a lossy compression of the training data. Autoencoding models make the link between ML and compression (and thus encryption) explicit.
I've finished the Cryptography I on Coursera already. Can't recommend it enough
Any recommendations for a technically competent person, but for someone with math knowledge trailing off at Calc 2?
It's straightforward to get yourself to a place where you can do cryptographic things and feel somewhat comfortable with what's happening. Truly understanding it to the point where you can reason safely about it is deceptively harder.
https://www.crypto101.io
2) Cryptography: Theory and Practice by Douglas Stinson et al. This is a more mathematical treatment and hence a nice complement to the Paar book above - https://www.routledge.com/Cryptography-Theory-and-Practice/S...
3) For understanding how cryptography is used in Networks see the classic Network Security: Private Communication in a Public World by Radia Perlman et al. The 2nd edition is where i started my journey into network security/cryptography needed for my then job. Highly recommended - https://www.amazon.com/Network-Security-Charlie-Kaufman/dp/0...
The first two books give you the "mechanisms" (and theory) of cryptography i.e. the building blocks. The last book puts everything together to implement "policies" via practical applications (eg. IPSec/SSL etc.) for the real world. They are complementary and hence should be studied together to get the full picture.
A large part of this book is aimed at the readers who want to know why we designed Rijndael in the way we did. For them, we explain the ideas and principles underlying the design of Rijndael, culminating in our wide trail design strategy.
The condition for carcinization is usually described as a kind of “shared condition.” After reading this article, that is what I felt.
In other words, from the perspective of shared conditions, isn’t it possible that systems receive similar pressures when they need to mix information?
1. There is a state space. 2. Each part of the input affects many parts of the output. 3. A simple rule is not enough, so nonlinearity becomes necessary. 4. But the hardware cannot be allowed to stall, so the system evolves toward a structure where simple transformations are repeated many times.
Ultimately, even across different fields, the core question is how to decompose complexity into atomic units. The choice of those units tends to converge under the pressures imposed by the underlying substrate. This seems to be the central thesis of the article.
This feels similar to how humans solve nonlinear differential equations.
If so, perhaps the structure of human cognition itself works in a similar way: when facing nonlinearity, we break it into smaller structures and design around those smaller parts.
Because my academic background is limited, I find it difficult to express this properly in language. But I think this kind of pressure can also be applied to programming and software theory.
When I think about software engineering, it also often starts from the smallest element that does not change easily, and then builds larger systems by composing those elements. In OOP, that unit is the conceptual object. In FP, it is the function. In DOP, it is data.
FP is mathematical. DOP is aligned with the data that computers store and transmit. OOP is connected to our abstract model of the world. That may be why different people are good at different paradigms.
OOP compresses the world into objects and responsibilities. FP compresses the world into functions and composition. DOP compresses the world into data and transformable structures. Utlimately, it is a question of how we cut complexity, what we choose as the minimal unit of decomposition.
Then what should this idea be called? And if we apply this to AI coding, what would it imply?
I have thoughts, but because I did not study enough, I feel frustrated that I cannot express them more fully. I wish I had learned more.
These are well-expressed thoughts on a really hard subject.