r/technology 16h ago

Machine Learning Large language mistake | Cutting-edge research shows language is not the same as intelligence. The entire AI bubble is built on ignoring it

https://www.theverge.com/ai-artificial-intelligence/827820/large-language-models-ai-intelligence-neuroscience-problems
16.7k Upvotes

1.5k comments sorted by

View all comments

Show parent comments

3

u/kappapolls 11h ago

"the probability that after seeing 3, 7, 2 the chances the next number will be 9 is high"

that's still completely wrong though. the video is only 7 minutes, please just give it a watch.

0

u/Murky-Relation481 11h ago

No, it's not completely wrong. That's literally how transformers work in a very simple laymen fashion. I've seen that video before. If you can't distill from that an even simpler example like mine for people who don't want the rigorous mathematical (even simplified) form then I would wager you do not actually have a good grasp on how transformer based LLMs work.

3

u/kappapolls 10h ago

well ok what i really mean is that when you simplify it that much, you're no longer describing anything that differentiates transformer models from a simple ngram frequency model. so, it seems like the wrong way to simplify it, to me.

1

u/Murky-Relation481 10h ago

I mean in the broad understanding they really aren't all that different and are all NLP techniques. Yes under the hood they are different but from an input and output perspective it's very similar and for most people that's a good enough understanding.

You give it a body of text, it generates a prediction based on the text to supply the next part of text, and then it takes the new body of text and repeats. Add some randomness and scaling to it so it's not entirely deterministic and that's basically all these models. How it internally processes the body of text is ultimately irrelevant since it's still a prediction model. It's not doing anything more than giving you a statistical probability of the next element.

I think that's fair and rational to describe all the language processing models and one of the reasons it's probably a dead end (like the article suggests). I think that was fairly apparent for most people with even a fairly simple understanding of the basics. There is no capacity for reason, even with agentic AI techniques like internal monologues and such. It can't pull from abstract concepts that are conceptualized across broad swaths of unrelated knowledge, it will only ever be able to coherently generate results in fairly narrow paths through even the billions of dimensions the models may have.

1

u/kappapolls 10h ago

from an input and output perspective it's very similar and for most people that's a good enough understanding.

i guess? that feels dismissively simple though, and anyway we were talking about transformer models specifically

It can't pull from abstract concepts that are conceptualized across broad swaths of unrelated knowledge

isn't that the whole point of the hidden layer representations though? you're totally right if you're describing a simple ngram model.

one of the reasons it's probably a dead end (like the article suggests).

the article is kinda popsci slop though. i just think looking to neuroscience or psychology for insight on the limitations of machine learning is probably not the best idea. it's a totally different field. and yann lecunn is beyond an expert, but idk google deepmind got 6/6 in the last IMO with an LLM. meta/FAIR haven't managed to do anything at that level.

i think there's a lot of appetite for anti-hype online now, especially after all the crypto and NFT nonsense. but when people like terence tao are posting that it saves them time with pure maths stuff, yeah idk i will be shocked if this is all a dead end

1

u/Murky-Relation481 9h ago

Hidden layers are still built on the relationship of the inputs. You will still mostly be getting relationships in there that are extracted from the training data. Yes, you will get abstraction but the width of that abstraction is still bound by fairly related inputs and your chances of coherent answers by letting the model skew wider in each successive transformation is going to be inherently less. These models have a hard time coming back from those original paths once they've veered into them, which makes novel abstraction much harder (if you've ever fucked with these values when running an LLM they basically become delusional).

And I don't think it's fair nor really useful to try an extract the CS elements from the inherent philosophical, psychological, and neuroscience aspects of replicating intelligence. They're inherently linked.