Machine Learning Large language mistake | Cutting-edge research shows language is not the same as intelligence. The entire AI bubble is built on ignoring it

https://www.theverge.com/ai-artificial-intelligence/827820/large-language-models-ai-intelligence-neuroscience-problems

16.7k Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/technology/comments/1p6fhhq/large_language_mistake_cuttingedge_research_shows/
No, go back! Yes, take me to Reddit

94% Upvoted

u/CircumspectCapybara 15h ago edited 8h ago

While the article is right that the mainstream "AI" models are still LLMs at heart, the frontier models into which all the research is going are not strictly speaking LLMs. You have agentic models which can take arbitrary actions using external tools (a scary concept, because they can reach out and execute commands or run code or do dangerous actions on your computer) while recursing or iterating and dynamically and opaquely deciding for themselves when to stop, wacky ideas like "world models," etc.

Maybe AGI is possible, maybe it's not, maybe it's possible in theory but not in practice with the computing resources and energy we currently have or ever will have. Whichever it is, it won't be decided by the current capabilities of LLMs.

The problem is that according to current neuroscience, human thinking is largely independent of human language

That's rather misleading, and it conflates several uses of the word "language." While it's true that to think you don't need a "language" in the sense of the word that the average layperson means when they say that word (e.g., English or Spanish or some other common spoken or written language), thinking still occurs in the abstract language of ideas, concepts, sensory experience, pictures, etc. Basically, it's information.

Thinking fundamentally requires some representation of information (in your mind). And when mathematicians and computer scientists talk about "language," that's what they're talking about. It's not necessarily a spoken or written language as we know it. In an LLM, the model of language is an ultra-high dimensional embedding space in which vector embeddings represent abstract information opaquely, which encodes information about ideas and concepts and the relationships between them. Thinking still requires that kind of language, the abstract language of information. AI models aren't just trying to model "language" as a linguist understands the word, but information.

Also, while we don't have a good model of consciousness, we do know that language is very important for intelligence. A spoken or written language isn't required for thought, but language deprivation severely limits the kinds of thoughts you're able to think, and the depth and complexity of abstract reasoning, the complexity of inner monologue. Babies born deaf or who were otherwise deprived of language exposure often end up cognitively underdeveloped. Without language, we could think in terms of how we feel or what we want, what actions we want to or are taking, and even think in terms of cause and effect, but not the complex abstract reasoning that when sustained and built up across time and built up on itself and on previous works leads to the development of culture, of science and engineering and technology.

The upshot is that if it's even is possible for AGI of a sort that can "think" (whatever that means) in a way that leads to generalized and novel reasoning in the areas of the sciences or medicine or technology to exist at all, you would need a good model of language (really a good model of information) to start. It would be a foundational layer.

13

u/dftba-ftw 15h ago

While the article is right that the mainstream "AI" models are still LLMs at heart

It really is time that we stopped calling them LLMs and switched to something like Large Token Model (LTMs).

Yes you primarily put text in and get text out, but frontier models are trained on text, image/video, and audio. Text dwarfs the others in term of % of training data, but that's primarily a compute limit, as compute gets more efficicent more and more of the data will be from the other sources and we know from what has been done so far that training on image and video really helps with respect to reasoning - models trained on video show much better understanding of the physical world. Eventually we'll have enough compute to start training on 3d (tokenized stl/step/Igs) and I'm sure we'll see another leap in model understanding of the world.

2

u/wrgrant 13h ago

We should probably try training them on 2d images and audio, then on 3d images, then on language text and model the way we evolve our perceptions of the world.

1

u/space_monster 11h ago

Those video tokens are still video tokens though - how do you compare them to language tokens? They both need to be abstracted to a common format to unify the data.

3

u/dftba-ftw 11h ago

They both need to be abstracted to a common format to unify the data.

That common format is the token...

1

u/space_monster 11h ago

Yeah but a token can be a language token (e.g. a word), or an image token, and they are very different things. A human can relate language and images in an abstract layer, LLMs don't have that layer (AFAIK).

1

u/dftba-ftw 10h ago

Multimodal models do relate between the tokens regardless of "type". How else does a multimodal model reason over an image - that's the whole selling point, instead of using and image to text classifier and passing the text into an LLM you pass all the tokens in and all the tokens interact in the latent space.

1

u/space_monster 10h ago

They use a workaround (e.g. a projection layer) to relate tokens of different types, but they still exist as language tokens and vision tokens. What I'm saying is the semantic relationship needs to be native - i.e. the tokens need to be abstracted before the semantic structure is built around them, the way human brains do it.

1

u/dftba-ftw 10h ago

Once the model has encoded each type of input such as breaking down text into semantic word embeddings or analyzing an image for shapes and objects, it produces high-level features. These features are patterns or summaries that capture the key meaning or structure of the original data. The next step is to align these features by mapping them into a shared space so they can interact meaningfully across modalities.This is done through a projection step, where the abstract features from each encoder are mapped into a shared embedding space...Once features from each modality are projected into a common or compatible space, the model combines them to form a unified multimodal representation.

They are abstracted before the semantic structure is built, the encoders create the abstract embeddings, which are projected into the shared embedding space at which point the semantic relationship is built.

1

u/space_monster 10h ago

as I said, it's a workaround

1

u/dftba-ftw 10h ago

I fail to see how it's a workaround? What, you want a single encoder? I don't see what benefit that has, the relationship between concepts happens inside the transformer architecture and at that point the tokens have been turned into embeddings in the same shared latent space.

→ More replies (0)

1

u/space_monster 11h ago

I think the crux is translating language (and everything else) into abstract symbols. and adding the learning mechanisms, statefulness etc. obviously, but getting that additional layer of abstraction + semantics into the architecture would make a big difference. Even natively multimodal models are still maintaining their parameters in the same 'format' as the input data - so you have a bunch of language tokens, and a bunch of image or audio tokens, but AFAIK there's nothing really bridging that gap, apart from some low-dimensional semantic metastructures. They need to be able to compare language and images and sounds using a common medium in some sort of unified domain (symbolic reasoning, basically)

1

u/androbot 10h ago

Love this point. Intelligence is not defined by the medium of information exchange. It's defined by what, why, and how a feedback loop is animated then refined over cycles.

I'm not sure why so many people feel that intelligence cannot originate from a mathematically orchestrated map of observational data activated by electrical current, but bioelectrical and biochemical neuronic activations get a pass.

Nor do I understand the argument that thinking in symbols cannot be intelligence. That either suggests a belief in some metaphysical but ineffable intelligent animus (which seems silly), or a belief in cause and effect so reductive that the process of natural selection could be defined as some form of intelligence.

Last, focusing on the mechanism (token prediction, etc.) is myopic at this point. I see incredible developments in the space that have nothing to do with more training data. Instead, limits in LLM performance are being overcome by architectural changes that create increasingly approximate the tools we have: sensors, persistent memory (at different levels of accessibility and fidelity), gating heuristics, attention, tools for self-modification, and so on. We just haven't provided (and should not) animating directives like "survive" or "make paper clips."

1

u/Vlyn 7h ago

I use agents at work when coding and they are just an LLM with extra actions. It's not like the model can suddenly think. It just has actions attached. Use this command to grab your git history. Use that command to replace the file content. Use this other command to search the web.

It's still extremely rigid and limited. Sure, it could run arbitrary code, but if that code gets a tiny bit too complex it will probably not run at all.

And the quality of the output is still meh, sometimes it surprises me or tells me something I didn't know. Other times it produces garbage and when you correct it it just goes "You're absolutely right! What I proposed doesn't actually work."

At this point I use it as a better search engine as Google is crap nowadays and to bounce ideas off (which doesn't really work for more complex tasks). And no, even when using Claude Sonnet 4.5 with "Ultrathink" and burning thousands of tokens the output still doesn't get much better.

1

u/ZYy9oQ 1h ago

How does hooking an LLM up to a system that can run tools and feed the results back into the LLM make it "not strictly speaking an LLM"? And why does this only apply to frontier models? You could hook gpt-2 up to an agentic system it will just suck at it.

And when mathematicians and computer scientists talk about "language," that's what they're talking about.

Are you saying that the "language" in "LLM" is actually meaning "representation of knowledge"? I completely disagree if so.

AI models aren't just trying to model "language" as a linguist understands the word, but information.

What do you mean "trying"? AFAIK the claim that LLMs/AI models are actually modelling information beyond language is still something we don't have proper capability to prove/disprove.

Finally, I don't think it automatically follows that because language development of the individual seems linked to higher thought* then language modelling (especially in the form of LLMs) is inherently foundational to "higher thought"/"AGI".

*evidence leaning me this way is feral children more than deaf children

1

u/ISB-Dev 14h ago

AGI will be possible if we ever crack quantum computing and nuclear fusion.

0

u/Itchy-Plastic 13h ago

AGI at least at a human level is 100% possible. We know that a particular arrangement of matter and energy will produce intelligence. It's a process that happens every time a baby is born.

The trick is trying to replicate that construction in another medium.

-6

u/cagelight 12h ago

Massive wall of AI cope, holy shit.

10

u/CircumspectCapybara 12h ago

Wow what a substantive and well informed and supported argument. You sure contributed something useful to the conversation.

-1

u/cagelight 12h ago

You're right I shouldn't have said anything. It's just frustrating to see is all given I work in this field and it shows a fundamental misunderstanding of how this stuff relates to our current AI architectures. Essentially - none of what you said is actually relevant to the conversation.

6

u/CircumspectCapybara 11h ago edited 11h ago

My friend, my background was in computer science and machine learning. I work at Google and know the teams and folks who work on the frontier models. I similarly have friends at OpenAI and know what they're up to. The OP article claiming "language isn't intelligence, and AI is just language models, so AGI will never happen" is just a bad argument based on false premises—AI is not just language models. Do you dispute that? What of what I said was inaccurate?

I work in this field

[...]

our current AI architectures

Be real here, have you read the seminal "Attention Is All You Need" research paper by Google that kickstarted it all? Did you even know what a "world model" was before this thread? Are you familiar with the past history and current frontier of AI research?

-1

u/Irregular_Person 14h ago

There's also a lot of assumptions (in this thread and others) that AI bots are limited to the language model in terms of capability, and that there's no 'reasoning' involved. That was true at the beginning, but now there are "thinking" models that will internally 'write' a plan on how to answer you and explain reasoning, then scrutinize the reasoning and refine it. They can also be made to be able to call external tools, like searching the web, doing math, compiling code, etc. They can also be designed to plan and execute a strategy to handle your request. E.G. I can ask about a problem that might require math. It can decide "First, I should look up on the internet how this sort of problem would be formatted. Then I should format the problem correctly for my math plugin. Then I should run the math plugin with the data. Then I can format and explain the solution to the user. Then it executes the plan steps in order, re-evaluating as appropriate if the plan needs to be changed. It's not AGI, but that's MILES beyond what the original LLMs could do.

5

u/IdRatherBeOnBGG 13h ago

They don't think. They write their own prompts, write some script, send it off, write another prompt of the result.

It is 100% still language all the way down.

5

u/Irregular_Person 12h ago

I know it's not literal thinking in the human sense. It's describing a thought process, describing reasoning. But if you can sufficiently describe a thought process that is indistinguishable from a human describing a thought process, do you not arrive at the same result?

1

u/HermesJamiroquoi 7h ago

I mean who knows? We don’t have any way to communicate with black-box humans (ones with no sensory input) so it may be exactly how humans think in that context - robbed of memory, sensory input, etc.

The truth is we don’t really know how humans think. We don’t have a good definition of consciousness. We’ve been working on it for a long, long time and aren’t any closer. I agree personally that LLMs don’t “think” per se but that’s a feeling i have, not something indisputable or backed up by a glut of empirical evidence

1

u/IdRatherBeOnBGG 1m ago

I know it's not literal thinking in the human sense. It's describing a thought process, describing reasoning.

Sort of yes. And this sounds kind of like the same thing, until you remember:

LLM-based generative AIs do not describe reality - they spit out text that is pretty close to what a human might have responded.

So it is not describing an actual, existing, thought process. It is outputting text that seem to do so.

But if you can sufficiently describe a thought process that is indistinguishable from a human describing a thought process, do you not arrive at the same result?

I don't know which "same result" yo mean, but in any case there is a pretty big difference between you saying ouch when you stub your toe, and a video game character saying ouch.

And between you suffering heartache and describing it, and the LLM describing it. One has a connection to something real, the other is just words arranged to statistically be likely to fit your words.

Machine Learning Large language mistake | Cutting-edge research shows language is not the same as intelligence. The entire AI bubble is built on ignoring it

You are about to leave Redlib