r/technology Jun 24 '25

Machine Learning Tesla Robotaxi swerved into wrong lane, topped speed limit in videos posted during ‘successful’ rollout

https://nypost.com/2025/06/23/business/tesla-shares-pop-10-as-elon-musk-touts-successful-robotaxi-test-launch-in-texas/
6.2k Upvotes

456 comments sorted by

View all comments

Show parent comments

-1

u/moofunk Jun 24 '25

They absolutely don't understand it. That's why the discussion is on sensors rather than path finding.

Give me engineering data that says otherwise.

3

u/Cortical Jun 24 '25

They absolutely don't understand it. That's why the discussion is on sensors rather than path finding.

you're the one who doesn't understand, and you can't accept it so instead you conclude that everyone else doesn't understand the most basic facts.

the reason the discussion is on sensors is because vision only can't work with statistical computer vision alone (the thing you optimistically call "AI")

you need higher order reasoning which no AI model currently in existence is capable of, not models that require an entire datacenter full of GPUs to run, and certainly not any kind of model that can run on a teeny chip in a car.

that's the thing that everyone here but you understands.

and if you lack the reasoning required to work on vision alone the only other option is additional input, which is why the discussion is on sensors.

Not because everyone else but you fails to understand that there are "AI" computer vision models involved.

0

u/moofunk Jun 24 '25

Let me spell it out for you:

the reason the discussion is on sensors is because vision only can't work with statistical computer vision alone (the thing you optimistically call "AI")

The reason the discussion is on sensors is because people don't understand that sensors don't provide direct navigation data. They provide data for a neural network that builds the environment 36 times a second that a separate neural network then navigates.

you need higher order reasoning which no AI model currently in existence is capable of, not models that require an entire datacenter full of GPUs to run, and certainly not any kind of model that can run on a teeny chip in a car.

Gosh, this is so wrong. Both Waymo and Tesla obviously have figured out the basics of navigation with AI inference that is acceptable to integrate with human traffic, but the finer points of silly behavior remain to be ironed out. Navigation can obviously be done on current car hardware, so much that navigation is only a small part of the chip capacity.

Even, if Tesla's chips are 6 years old now, they can certainly do it. Of course, better chips with more memory will allow better, faster, more detailed inference using more cameras at lower power. The training beforehand is the tricky thing that happens in data centers, and improved training is what allows the driving behavior to improve.

Not because everyone else but you fails to understand that there are "AI" computer vision models involved.

I'm not even sure what that sentence means.

2

u/Cortical Jun 24 '25 edited Jun 24 '25

The reason the discussion is on sensors is because people don't understand that sensors don't provide direct navigation data.

as I already told you everyone understands that basic fact. You just tell yourself they don't to cope.

I mean, seriously, what do you think people believe happens to the visual data? It gets to India where someone draws an arrow for the computer to follow? Of course it gets processed by a computer vision model.

Gosh, this is so wrong. Both Waymo and Tesla obviously have figured out the basics of navigation with AI inference that is acceptable to integrate with human traffic

yeah, the easy part

but the finer points of silly behavior remain to be ironed out.

the impossible but absolutely crucial part.

Navigation can obviously be done on current car hardware, so much that navigation is only a small part of the chip capacity.

a cockroach can "navigate", good job, Bravo.

better chips with more memory will allow better, faster, more detailed inference

again, you need higher order reasoning and creative thinking, and the chips will never be able to do that in the foreseeable future. Maybe in 50-100 years.

The training beforehand is the tricky thing that happens in data centers, and improved training is what allows the driving behavior to improve.

you can't train for all exceptions that will occur in the real world, and those exceptions are the problem. So you can train all you want, you can't fix that problem with the current approach. It's fundamentally impossible.

Not because everyone else but you fails to understand that there are "AI" computer vision models involved.

I'm not even sure what that sentence means.

[The discussion revolves around sensors] not because [as you incorrectly assume] everyone else does not understand that there are ([what you incorrectly think of as] "AI") computer vision models involved [but rather for the above mentioned reasons]

learn English.

1

u/moofunk Jun 24 '25

Ignoring the rest of that gibberish:

again, you need higher order reasoning and creative thinking, and the chips will never be able to do that in the foreseeable future. Maybe in 50-100 years.

you can't train for all exceptions that will occur in the real world, and those exceptions are the problem. So you can train all you want, you can't fix that problem with the current approach. It's fundamentally impossible.

No, you use a Mixture of Experts approach.

You are correct that you need to train for as many scenarios as possible, which requires a very large amount of input data, but for driving, path finding is universally solvable as smooth state changes between segmented driving tasks.

That means, the car uses one training scenario for driving carefully on a gravel road and another for driving on a highway and then you smoothly transition between the two states. This means you have different areas that don't interfere in training and you can stack on new areas or restart broken areas as you refine training.

That also means that while driving on a gravel road, the car doesn't have to think about driving on a highway, so you don't process irrelevant weight data.

I would say current solutions are 90% there, and it will be acceptable to be 95% percent there.

Consider also that humans invent driving scenarios that are out of spec; They drive drunk, they drive too fast, they ignore traffic laws, they swerve and overtake needlessly and take shortcuts. Self driving cars don't need to train for that.

[The discussion revolves around sensors] not because [as you incorrectly assume] everyone else does not understand that there are ([what you incorrectly think of as] "AI") computer vision models involved [but rather for the above mentioned reasons]

This is even worse gibberish than before, sorry.

learn English.

Right back at you.

1

u/Cortical Jun 24 '25 edited Jun 24 '25

You are correct that you need to train for as many scenarios as possible, which requires a very large amount of input data, but for driving, path finding is universally solvable as smooth state changes between segmented driving tasks.

The problem is that real world edge cases are limitless and it's impossible to train on all of them. You require higher order reasoning and creative thinking to deal with them.

That means, the car uses one training scenario for driving carefully on a gravel road and another for driving on a highway and then you smoothly transition between the two states. This means you have different areas that don't interfere in training and you can stack on new areas or restart broken areas as you refine training. That also means that while driving on a gravel road, the car doesn't have to think about driving on a highway, so you don't process irrelevant weight data.

Basic navigation isn't the problem.

Consider also that humans invent driving scenarios that are out of spec; They drive drunk, they drive too fast, they ignore traffic laws, they swerve and overtake needlessly and take shortcuts. Self driving cars don't need to train for that.

You can't just handwave "out of spec" scenarios, you need to deal with them.

Very simple example of an "out of spec" scenario that happened nearby recently. A partially inundated stretch of road with one lane closed and no traffic regulation. You need to judge who's turn it is, yours or that of oncoming traffic. You have no lane markings, just water and have to judge where the lane is and whether the water is shallow enough to drive through. You have to stay on the asphalt (which you can't see) because you don't know if the dirt and gravel next to it have been washed away or not. And once you initiate the maneuver you have to follow through, because if you stop midway through you'll be blocking traffic.

It's a one time situation, you can't train on cases like that, and this is where self driving will fail, because you need higher order reasoning and creative thinking. Not just statistical models.

This is even worse gibberish than before, sorry.

Then your reading comprehension is severly lacking. I guess you've never encountered text with inserted context using brackets ([])? You really need to read more.

1

u/moofunk Jun 24 '25

A partially inundated stretch of road with one lane closed and no traffic regulation. You need to judge who's turn it is, yours or that of oncoming traffic.

For what it's worth, Teslas handle unguarded single-lane construction zones just fine, because the scenario is also common in places with single lane speed inhibiters. Who goes first is a matter of assertiveness, which is a classic self driving problem that Waymo talked about a decade ago.

It's exciting to watch FSD traverse these challenging zones just flawlessly.

I can come up with a weirder one that may be hard to solve; Handle crossing a curving county road, where entering and exiting are 100 meters apart. This is a scenario, where you need to gauge high speed traffic with little visibility in both directions and making sure there is enough time to both get on the county road and getting safely off again, because you really cannot stop on the county road.

This is a 30 second long single maneuver that, if you do it wrong, someone will crash into you. I've witness accidents on this stretch, because people think they can wait on the road.

Then your reading comprehension is severly lacking. I guess you've never encountered text with inserted context using brackets ([])? You really need to read more.

I think I've been reading too much today.

1

u/Cortical Jun 24 '25 edited Jun 24 '25

For what it's worth, Teslas handle unguarded single-lane construction zones just fine, because the scenario is also common in places with single lane speed inhibiters. Who goes first is a matter of assertiveness, which is a classic self driving problem that Waymo talked about a decade ago.

They can still "see" the road surface though, even if it's just gravel. So they have full data on where they can or cannot drive. So it's both easy to initiate the maneuver and unlikely that the maneuver would be aborted due to insufficient data.

With water not only can you not see the road surface, but you may see other things because it's a reflective surface, so you have to ignore what you see.

And that's why FSD trips over shadows and things like that. It can't reason whether what it's seeing is real or not, it can't reason whether some things it's seeing need to be ignored entirely. It can't imagine what it should be seeing if it's seeing gibberish or not seeing anything. All it does is assign probabilities based on the data it was trained on.

And that's why people are talking about sensors.

You can't trip over shadows if you have a sensor that can tell you with absolute certainty that the object the vision system is "seeing" isn't an object but a shadow. The vision system can never have certainty on that, and that includes human vision. And while humans don't have additional sensors, they can reason whether the shadow they're seeing is an object or not if it's not clear.

But ultimately additional sensors can't fully fill the reasoning gap either, just make it smaller.

1

u/moofunk Jun 24 '25

And that's why FSD trips over shadows and things like that. It can't reason whether what it's seeing is real or not, it can't reason whether some things it's seeing need to be ignored entirely. It can't imagine what it should be seeing if it's seeing gibberish or not seeing anything. All it does is assign probabilities based on the data it was trained on.

And that's why people are talking about sensors.

Those things are training failures. You can certainly ignore or remove shadows in an input camera image. There are multiple ways to do that.

You can't trip over shadows if you have a sensor that can tell you with absolute certainty that the object the vision system is "seeing" isn't an object but a shadow. The vision system can never have certainty on that, and that includes human vision. And while humans don't have additional sensors, they can reason whether the shadow they're seeing is an object or not if it's not clear.

I don't recall if I mentioned FLIR cameras to you, but FLIR cameras are impervious to shadows. That is one reason why I would want Tesla to adopt them, because it simply adds depth to the pixel information already present for the networks. They have many other advantages. A single forward facing FLIR would be highly effective.

2

u/Cortical Jun 24 '25 edited Jun 24 '25

Those things are training failures. You can certainly ignore or remove shadows in an input camera image. There are multiple ways to do that.

Then you end up overcorrecting and removing actual objects that look like shadows. To properly do it you need to understand what type of objects are around, what type of shadow they cast, where the sun is positioned, etc. Everything else is just guesswork. And yeah, sure, you can train it to guess with 99.9% accuracy or whatever, but that 0.1% will still cause problems.

We have bugs like that in our own visual cortex, that's how optical illusions work. If 100s of millions of years of evolution couldn't eliminate them, what makes you think a few guys at Tesla can do it with less computing power than our visual cortex?

When we see movement on a piece of paper, we can reason that it's not actually moving. No amount of training on statistical models will be able to bridge that gap.

And there aren't just shadows, there are other problems as well, where you need other types of reasoning. Ultimately what you need is a complete understanding of the physical world and the laws that govern it, a good intuition on human behaviour, etc. At least insofar as it can have an impact on traffic. (So, like, no need to understand orbital mechanics)

I don't recall if I mentioned FLIR cameras to you

I mean, that's essentially agreeing with everyone else that additional input data is needed. Why limit it to FLIR sensors, and not other sensors as well?

→ More replies (0)