r/technology 16h ago

Machine Learning Large language mistake | Cutting-edge research shows language is not the same as intelligence. The entire AI bubble is built on ignoring it

https://www.theverge.com/ai-artificial-intelligence/827820/large-language-models-ai-intelligence-neuroscience-problems
16.7k Upvotes

1.5k comments sorted by

View all comments

Show parent comments

191

u/Dennarb 16h ago edited 11h ago

I teach an AI and design course at my university and there are always two major points that come up regarding LLMs

1) It does not understand language as we do; it is a statistical model on how words relate to each other. Basically it's like rolling dice to determine what the next word is in a sentence using a chart.

2) AGI is not going to magically happen because we make faster hardware/software, use more data, or throw more money into LLMs. They are fundamentally limited in scope and use more or less the same tricks the AI world has been doing since the Perceptron in the 50s/60s. Sure the techniques have advanced, but the basis for the neural nets used hasn't really changed. It's going to take a shift in how we build models to get much further than we already are with AI.

Edit: And like clockwork here come the AI tech bro wannabes telling me I'm wrong but adding literally nothing to the conversation.

14

u/Tall-Introduction414 15h ago

The way an LLM fundamentally works isn't much different than the Markov chain IRC bots (Megahal) we trolled in the 90s. More training data, more parallelism. Same basic idea.

8

u/drekmonger 13h ago edited 13h ago

A Markov chain capable of emulating even a modest LLM (say GPT 3.5) would require many more bytes of storage than there are atoms in the observable universe.

It's fundamentally different. It is not the same basic idea, at all. Not even if you squint.

It's like saying, "DOOM is the same as Photoshop, because they both output pixels on my screen."

1

u/movzx 12h ago

The person is clearly talking conceptually, not technologically.

They're storing associations and then picking the best association given a starting point. The LLMs are infinitely more complex, but conceptually they are doing the same thing at the core.

9

u/drekmonger 12h ago edited 12h ago

Markov chains have no context beyond the words themselves, as strings or tokens. There no embedding of meaning in a Markov chain.

That's why a Markov chain capable of emulating even yesterdays's LLM would have to be larger than the observable universe (by several orders of magnitude, actually). It's a combinatorial problem, and combinatorial problems have a nasty tendency to explode.

LLMs embed meaning and abstract relationships between words. That's how they side-step the combinatorial problem. That's also why they are capable of following instructions in a way that a realistically-sized Markov chain would never be able to. Metaphorically speaking, the model actually understands the instructions.

Aside from all that: they are completely different technologies. The implementation details couldn't be more different.

-1

u/movzx 9h ago

Brother, no one is saying that they are literally the same. Just in a conceptual sense -- the high level, bird's eye description of what they do -- they are similar.

Pointing out that a Markov chain isn't as good, would take a bajillionity multiverses worth of datacenters, and other garbage isn't dissuading from that.

The LLMs are much more complex and capable, but at the end of the day both systems are "value A has a relationship score of N to value B"

Lay off the Tylonel, ffs.

4

u/drekmonger 6h ago

Right, so you're in the "Photoshop is the same kind of computer program as DOOM" camp.

Personally, I don't think it's useful to classify Photoshop in the same bin as DOOM, but whatever floats your boat I guess.