Wowed by a new paper I just read and wish I had thought to write myself. Lukas Berglund and others, led by Owain Evans, asked a simple, powerful, elegant question: can LLMs trained on A is B infer automatically that B is A? The shocking (yet, in historical context, see below, unsurprising) answer is no:
These things are interesting for two reasons (to me).
The first is that it seems utterly unsurprising that these inconsistencies exist. These are language models. People seem to fall easily into the trap in believing them to have any kind of “programming” on logic.
The second is just how unscientific NN or ML is. This is why it’s hard to study ML as a science. The original paper referenced doesn’t really explain the issue or explain how to fix it because there’s not much you can do to explain ML(see their second paragraph in the discussion). It’s not like the derivation of a formula where you point to one component of the formula as say “this is where you go wrong”.
It’s actually getting more scientific. Think of it like biology. We do a big study of an ml model or an organism and confirm a property of it.
It used to be it was just maths, you could spot an error in your code and fix it. Then it was a bag of hacks and you could keep just patching your model with more and more tweaks that didn’t have a solid theoretical basis but that improved performance.
Now it’s too big and too complex and we have to do science to understand the model limitations.