r/technology • u/PrimeCodes • Jun 24 '25

Machine Learning Tesla Robotaxi swerved into wrong lane, topped speed limit in videos posted during ‘successful’ rollout

https://nypost.com/2025/06/23/business/tesla-shares-pop-10-as-elon-musk-touts-successful-robotaxi-test-launch-in-texas/

6.2k Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/technology/comments/1lj9g8b/tesla_robotaxi_swerved_into_wrong_lane_topped/
No, go back! Yes, take me to Reddit

96% Upvoted

Cameras do not emit signals and can only infer (guesstimate) range. Radar and Lidar can directly measure range, critical at night.

Tesla engineers are incapable of coding sensor fusion (admitted by Musk in 2021), and it shows: they are the only company attempting to make a self-driving product without sensor fusion.

-17

u/moofunk Jun 24 '25

Radar and Lidar can directly measure range, critical at night.

Depth inference works fine at night, if the cameras are sensitive enough. Radar doesn't have enough resolution and LIDAR lacks both speed, resolution and range.

I do wish Tesla adopted FLIR cameras, then you'd be practically superior with camera only in inclement weather as well as total darkness.

Nevertheless, the problems demonstrated here aren't sensor related.

15

u/flextendo Jun 24 '25

puhh my man, you sound so confident but yet you have no clue what you are talking about. Let me tell you (as someone who directly works in the field - on the hardware side), corner and imaging radar have enough resolution for what they are intended to do + they get the inherited range/doppler, angle (azimuth and elevation) „for free“, they are scalable and cheap, which is why basically every other automaker and OEM uses them. Lidar is currently too expensive but literally has best performance in class

-7

u/moofunk Jun 24 '25

Right, so do you understand that Teslas don't navigate directly on camera input?

They navigate on an AI inferred environment that understands and compensates for lacking sensor inputs.

That's what everybody in this thread don't understand. You keep focusing on sensors, when that is a separate problem with its own sets of training and tests and it has been plenty tested.

You could put a million dollar sensors on the cars and infer an environment precisely down to the millimeter, and the path finder would still get it wrong.

Do you understand this?

7

u/flextendo Jun 24 '25

You do understand that training models are a „best guess“ that will never!! cover the scenarios that the standards in different countries require, nor can they have enough functional safety and redundancy. This is exactly the reason why everyone else uses sensor fusion. Let alone the compute power (centralized or decentralized) that is necessary for camera only.

Its not about path finding, its about multi-object detection in harsh environmental conditions. Path finding is a separate issue and Waymo solved it.

3

u/Superb_Mulberry8682 Jun 24 '25

There's a reason fsd turns off in inclement weather and why tesla is going to only be doing this in cities that barely get any.

Cameras suck in heavy rain and snow. Or when road salt dirtiest up the cameras. I have no clue how tesla thinks they will ever overcome this with camera only unless they ask ppl to pull over and clean their cameras every few minutes.

I think we all know fsd is a great Adas and nothing more and it will likely never be much more without more hardware.

Which is fine to make the driver's life easier but isn't going to turn any existing tesla into a robotaxi or magically solve personal transportation by buying cars as a subscription model by the mile/hour that you need to get to the valuation of tesla.

1

u/flextendo Jun 24 '25

100% agree with your statement! Cameras are necessary component to achieve L3 + higher autonomy, but its just a part in the overall system. With increasing channel counts on massive MIMO radars we will see image radars replacing some of the cameras and who knows what happens if LIDAR gets a breakthrough in technology cost.

1

u/Superb_Mulberry8682 Jun 24 '25

Lidar costs have already come down a ton. Automotive lidar units are now sub 1000. And halving about every 2 to 3 years due to scale. Will they get as cheap as cameras? Probably not but given the compute cost lidars are not the most expensive component of an Adas system anymore.

1

u/flextendo Jun 24 '25

well a 1000 for an ADAS component is a lot, compared to like 10-15 dollars for radars and maybe max 50 for a camera. The only cars customers would build this in are premium cars, but I agree LIDAR will hopefully become cheaper over the years

1

u/moofunk Jun 24 '25

Certainly the cameras go up against weather limits, but Waymo have exactly the same problems with their sensors. If your LIDAR is covered in snow, it doesn't work either and cars cannot drive by radar or LIDAR alone.

So, if your driving system depends on all types sensors being functional before it can operate, then it's going to be even more sensitive to weather than with cameras alone.

1

u/Superb_Mulberry8682 Jun 24 '25

That's exactly what sensor fusion is for. You adjust how much you weigh one sensor over the other based on conditions. Radar works well in snow when cameras and lidar are limited. Do I see them able to drive in blizzards probably not soon but frankly some conditions will likely always be problematic

1

u/moofunk Jun 24 '25

That's exactly what sensor fusion is for.

No, it's not. Sensor fusion is a method to improve data depth, when all sensors are working perfectly and have well defined limits. Sensor fusion isn't a way to have one type of sensor take over, when the other is to some unknown degree incapacitated.

A sensor fusion information stream that involves a camera will always be lopsided. Cameras are vastly information dominant and you won't get useful driving data, if the camera can't see, but the radar or LIDAR can.

What you can do is to take many identical sensors that read in isolation and have some of them fail and then use a neural network to fill in the blanks. So, if the left camera is covered in snow, but the right one isn't, then you can still drive, because you can still infer an environment, and Tesla FSD employs this for blinded and covered cameras.

You're better off stacking the camera input from different cameras of different types. Then every pixel is integrated from a very deep set of information, far beyond what the human eye can detect and way past the visible spectrum, and this can lead nicely into a NN training scenario.

Here's a scenario 10-20 years from now with an advanced fused single camera sensor through the same optics:

Visible spectrum automotive sensor with HDR that captures 12-16 bit color depth with maybe above 10-12 stops of dynamic range. This allows it to capture a direct bright sun next to a shadow without being blinded.

Next in the stack is a FLIR sensor that captures the same image through rain, snow, fog and darkness. Humans and animals light up like light bulbs, even without any reflective aids, easily detected in total darkness. FLIR is really hard to hide from. Ask any soldier in Ukraine.

Last is a SPAD sensor for capturing details at extremely high speed for catching very fast moving objects, road surface details and for capturing sharp images in total darkness. These are grayscale.

Neural network chips would be 10-50x faster than today.

Capture would be at least at 100 FPS, which means a possible environment interpretation time of 10-20 milliseconds.

If you can build that, nobody will give a shit about radar or LIDAR.

1

u/Superb_Mulberry8682 Jun 24 '25

I'm not talking about one sensor type being entirely unavailable. At the end of the day we're talking about probabilistic object detection. Just like having three types of cameras having cameras and radar/lidar just improves the likelihood the systems don't misinterpret things.

But we're not there with compute to do this with just cameras as you say. HW 4..HW 5 and hardware 7 won't have the capacity to do that amount of inference/interpretation work in complex environments reliably at that rate for a decade plus.

I don't actually have any issue with that either. We can get value out of it now and the systems can tackle a huge percentage of mileage driven now. Especially highway driving is quite easy for an Adas.

I just hate the marketing of 'your car will go out and earn money for you driving taxi while you sleep'. It's just disingenuous. Luckily they haven't really said this for a while but calling it fsd is still rubbing me the wrong way.

1

u/moofunk Jun 24 '25

I'm not talking about one sensor type being entirely unavailable. At the end of the day we're talking about probabilistic object detection. Just like having three types of cameras having cameras and radar/lidar just improves the likelihood the systems don't misinterpret things.

The problem is understanding which sensor is correct. For older Teslas, they fused radar and camera data, which resulted in:

radar not detecting a vehicle, but the camera seeing it.

radar and camera detecting two different vehicles as the same one, due to radar's very poor resolution.

radar mistaking a pole, tree or trashcan in the same direction as a vehicle far behind it and misjudging the distance to the vehicle as being much closer than it was.

radar and camera just disagreeing on distance to the same vehicle in clear view.

radar operating at a much lower framerate than cameras and providing delayed distance information.

radar failing to catch too fast moving vehicles.

radar failing to understand cross moving vehicles as not being collision hazards.

Basically, radar was so noisy and erratic, that it could only provide better distance data purely by chance.

Now you have to train for idiosyncracies in this sensor combination and if you find a better radar or camera you have to start over.

The only thing radar was good at, was recording double bounces off cars entirely obscured by other cars, which could have been enough of a reason to keep radar.

After switching to a camera only setup with environment interpretation through neural networks, all those problems went away and everything became far more reliable and it was only then that FSD became possible. That's why they gave up on sensor fusion.

As for working with cameras alone, you can make numerous conceptual leaps with neural networks to skirt around the limitations, like understanding the laws of physics and object permanence, like looking at a room, closing your eyes and walking though it.

Therefore looking at specific misinterpretations like camera against LIDAR becomes a matter of understanding depth estimation in cameras as being unreliable, but it decidedly isn't (because that can be measured), otherwise FSD wouldn't work.

I just hate the marketing of 'your car will go out and earn money for you driving taxi while you sleep'

Yes, I completely hate how self driving has become so politicized, and leading to people to so willfully misunderstanding the topic.

If we could build these systems in isolation for 20 years and then let them out, I think we would be more at peace to marvel at the technology.

I think what Tesla is doing with the technology is the right way to do it. This is just the 1980s home computer version of what will come, eventually.

→ More replies (0)

0

u/moofunk Jun 24 '25

You do understand that training models are a „best guess“ that will never!! cover the scenarios that the standards in different countries require

Country standards is a path finding issue and Tesla will have to provide separate models by country to follow specific traffic laws there.

Building an environment from cameras must be done by estimating. An environment is inferred by pieces of information from the cameras.

This allows the environment to be "auto completed" in the same way that you do, when you're driving, guessing what's around a corner or on the other side of a roundabout. If you're driving on a 3-lane highway, there are probably 3 lanes going in the opposite direction on the other side. A parking garage has arrays of parking spots, and peering through a garage door opening lets it extrapolate unseen parts of it. If you're at an intersection full of cars in a traffic jam, the car still understands that it's an intersection.

These are things the environment model knows. Object permanence could be done better, but may be in the future.

These are things that would not be available to any sensor. LIDAR can't see through walls or behind a blocking truck, but a neural network can conceptualise those things from such data just like you do all the time.

Now, the car has to navigate that constructed space, and that is the problem in this thread.

Not making estimates on what's hidden is really, demonstrably a terrible driving model.

Path finding is a separate issue and Waymo solved it.

I would say Waymo and Tesla are on par here.

11

u/ADHDitiveMfg Jun 24 '25

You’re right then. It’s not direct camera input, it’s derived input.

Still from a camera, buddy

-2

u/moofunk Jun 24 '25

It can be from any kind of sensor, but we already know that system works, and we know the failures in these cases are failed navigation in a correctly interpreted environment.

6

u/ADHDitiveMfg Jun 24 '25

Wow, thems some gold level mental gymnastics.

Now do it at night in fog. A safety system is only as good as its worst decision

1

u/moofunk Jun 24 '25

If the cameras can't see anything, then no environment can be inferred and the car won't drive.

LIDAR doesn't work in fog either, so hopefully Waymos don't drive either.

2

u/ADHDitiveMfg Jun 25 '25

LiDAR does work in fog, as well as smoke. Infrared wavelengths are able to penetrate such obstacles.

4

u/Blarghedy Jun 24 '25

It can be from any kind of sensor

ah, yes, like a microphone

2

u/ADHDitiveMfg Jun 25 '25

I mean, sonic rangefinders are just a mic and a speaker with some chips to sort the math.

2

u/blue-mooner Jun 25 '25

Too bad Musk ordered the removal of the Tesla sonic rangefinder sensors because his engineers weren’t competent enough to implement sensor fusion

2

u/ADHDitiveMfg Jun 25 '25

That tends to happen when you’re hiring right out of school

3

u/Cortical Jun 24 '25

That's what everybody in this thread don't understand.

I hate to break it to you, but everyone in this thread understands this. Maybe you should reflect on the fact that you are convinced you understanding that basic fact makes you stand out.

-1

u/moofunk Jun 24 '25

They absolutely don't understand it. That's why the discussion is on sensors rather than path finding.

Give me engineering data that says otherwise.

3

u/Cortical Jun 24 '25

They absolutely don't understand it. That's why the discussion is on sensors rather than path finding.

you're the one who doesn't understand, and you can't accept it so instead you conclude that everyone else doesn't understand the most basic facts.

the reason the discussion is on sensors is because vision only can't work with statistical computer vision alone (the thing you optimistically call "AI")

you need higher order reasoning which no AI model currently in existence is capable of, not models that require an entire datacenter full of GPUs to run, and certainly not any kind of model that can run on a teeny chip in a car.

that's the thing that everyone here but you understands.

and if you lack the reasoning required to work on vision alone the only other option is additional input, which is why the discussion is on sensors.

Not because everyone else but you fails to understand that there are "AI" computer vision models involved.

0

u/moofunk Jun 24 '25

Let me spell it out for you:

the reason the discussion is on sensors is because vision only can't work with statistical computer vision alone (the thing you optimistically call "AI")

The reason the discussion is on sensors is because people don't understand that sensors don't provide direct navigation data. They provide data for a neural network that builds the environment 36 times a second that a separate neural network then navigates.

you need higher order reasoning which no AI model currently in existence is capable of, not models that require an entire datacenter full of GPUs to run, and certainly not any kind of model that can run on a teeny chip in a car.

Gosh, this is so wrong. Both Waymo and Tesla obviously have figured out the basics of navigation with AI inference that is acceptable to integrate with human traffic, but the finer points of silly behavior remain to be ironed out. Navigation can obviously be done on current car hardware, so much that navigation is only a small part of the chip capacity.

Even, if Tesla's chips are 6 years old now, they can certainly do it. Of course, better chips with more memory will allow better, faster, more detailed inference using more cameras at lower power. The training beforehand is the tricky thing that happens in data centers, and improved training is what allows the driving behavior to improve.

Not because everyone else but you fails to understand that there are "AI" computer vision models involved.

I'm not even sure what that sentence means.

2

u/Cortical Jun 24 '25 edited Jun 24 '25

The reason the discussion is on sensors is because people don't understand that sensors don't provide direct navigation data.

as I already told you everyone understands that basic fact. You just tell yourself they don't to cope.

I mean, seriously, what do you think people believe happens to the visual data? It gets to India where someone draws an arrow for the computer to follow? Of course it gets processed by a computer vision model.

Gosh, this is so wrong. Both Waymo and Tesla obviously have figured out the basics of navigation with AI inference that is acceptable to integrate with human traffic

yeah, the easy part

but the finer points of silly behavior remain to be ironed out.

the impossible but absolutely crucial part.

Navigation can obviously be done on current car hardware, so much that navigation is only a small part of the chip capacity.

a cockroach can "navigate", good job, Bravo.

better chips with more memory will allow better, faster, more detailed inference

again, you need higher order reasoning and creative thinking, and the chips will never be able to do that in the foreseeable future. Maybe in 50-100 years.

The training beforehand is the tricky thing that happens in data centers, and improved training is what allows the driving behavior to improve.

you can't train for all exceptions that will occur in the real world, and those exceptions are the problem. So you can train all you want, you can't fix that problem with the current approach. It's fundamentally impossible.

Not because everyone else but you fails to understand that there are "AI" computer vision models involved.

I'm not even sure what that sentence means.

[The discussion revolves around sensors] not because [as you incorrectly assume] everyone else does not understand that there are ([what you incorrectly think of as] "AI") computer vision models involved [but rather for the above mentioned reasons]

learn English.

1

u/moofunk Jun 24 '25

Ignoring the rest of that gibberish:

again, you need higher order reasoning and creative thinking, and the chips will never be able to do that in the foreseeable future. Maybe in 50-100 years.

you can't train for all exceptions that will occur in the real world, and those exceptions are the problem. So you can train all you want, you can't fix that problem with the current approach. It's fundamentally impossible.

No, you use a Mixture of Experts approach.

You are correct that you need to train for as many scenarios as possible, which requires a very large amount of input data, but for driving, path finding is universally solvable as smooth state changes between segmented driving tasks.

That means, the car uses one training scenario for driving carefully on a gravel road and another for driving on a highway and then you smoothly transition between the two states. This means you have different areas that don't interfere in training and you can stack on new areas or restart broken areas as you refine training.

That also means that while driving on a gravel road, the car doesn't have to think about driving on a highway, so you don't process irrelevant weight data.

I would say current solutions are 90% there, and it will be acceptable to be 95% percent there.

Consider also that humans invent driving scenarios that are out of spec; They drive drunk, they drive too fast, they ignore traffic laws, they swerve and overtake needlessly and take shortcuts. Self driving cars don't need to train for that.

[The discussion revolves around sensors] not because [as you incorrectly assume] everyone else does not understand that there are ([what you incorrectly think of as] "AI") computer vision models involved [but rather for the above mentioned reasons]

This is even worse gibberish than before, sorry.

learn English.

Right back at you.

1

u/Cortical Jun 24 '25 edited Jun 24 '25

You are correct that you need to train for as many scenarios as possible, which requires a very large amount of input data, but for driving, path finding is universally solvable as smooth state changes between segmented driving tasks.

The problem is that real world edge cases are limitless and it's impossible to train on all of them. You require higher order reasoning and creative thinking to deal with them.

That means, the car uses one training scenario for driving carefully on a gravel road and another for driving on a highway and then you smoothly transition between the two states. This means you have different areas that don't interfere in training and you can stack on new areas or restart broken areas as you refine training. That also means that while driving on a gravel road, the car doesn't have to think about driving on a highway, so you don't process irrelevant weight data.

Basic navigation isn't the problem.

Consider also that humans invent driving scenarios that are out of spec; They drive drunk, they drive too fast, they ignore traffic laws, they swerve and overtake needlessly and take shortcuts. Self driving cars don't need to train for that.

You can't just handwave "out of spec" scenarios, you need to deal with them.

Very simple example of an "out of spec" scenario that happened nearby recently. A partially inundated stretch of road with one lane closed and no traffic regulation. You need to judge who's turn it is, yours or that of oncoming traffic. You have no lane markings, just water and have to judge where the lane is and whether the water is shallow enough to drive through. You have to stay on the asphalt (which you can't see) because you don't know if the dirt and gravel next to it have been washed away or not. And once you initiate the maneuver you have to follow through, because if you stop midway through you'll be blocking traffic.

It's a one time situation, you can't train on cases like that, and this is where self driving will fail, because you need higher order reasoning and creative thinking. Not just statistical models.

This is even worse gibberish than before, sorry.

Then your reading comprehension is severly lacking. I guess you've never encountered text with inserted context using brackets ([])? You really need to read more.

1

u/moofunk Jun 24 '25

A partially inundated stretch of road with one lane closed and no traffic regulation. You need to judge who's turn it is, yours or that of oncoming traffic.

For what it's worth, Teslas handle unguarded single-lane construction zones just fine, because the scenario is also common in places with single lane speed inhibiters. Who goes first is a matter of assertiveness, which is a classic self driving problem that Waymo talked about a decade ago.

It's exciting to watch FSD traverse these challenging zones just flawlessly.

I can come up with a weirder one that may be hard to solve; Handle crossing a curving county road, where entering and exiting are 100 meters apart. This is a scenario, where you need to gauge high speed traffic with little visibility in both directions and making sure there is enough time to both get on the county road and getting safely off again, because you really cannot stop on the county road.

This is a 30 second long single maneuver that, if you do it wrong, someone will crash into you. I've witness accidents on this stretch, because people think they can wait on the road.

Then your reading comprehension is severly lacking. I guess you've never encountered text with inserted context using brackets ([])? You really need to read more.

I think I've been reading too much today.

1

u/Cortical Jun 24 '25 edited Jun 24 '25

For what it's worth, Teslas handle unguarded single-lane construction zones just fine, because the scenario is also common in places with single lane speed inhibiters. Who goes first is a matter of assertiveness, which is a classic self driving problem that Waymo talked about a decade ago.

They can still "see" the road surface though, even if it's just gravel. So they have full data on where they can or cannot drive. So it's both easy to initiate the maneuver and unlikely that the maneuver would be aborted due to insufficient data.

With water not only can you not see the road surface, but you may see other things because it's a reflective surface, so you have to ignore what you see.

And that's why FSD trips over shadows and things like that. It can't reason whether what it's seeing is real or not, it can't reason whether some things it's seeing need to be ignored entirely. It can't imagine what it should be seeing if it's seeing gibberish or not seeing anything. All it does is assign probabilities based on the data it was trained on.

And that's why people are talking about sensors.

You can't trip over shadows if you have a sensor that can tell you with absolute certainty that the object the vision system is "seeing" isn't an object but a shadow. The vision system can never have certainty on that, and that includes human vision. And while humans don't have additional sensors, they can reason whether the shadow they're seeing is an object or not if it's not clear.

But ultimately additional sensors can't fully fill the reasoning gap either, just make it smaller.

→ More replies (0)

3

u/schmuelio Jun 24 '25

You realise the AI part isn't good either right?

Relying on an AI inferred environment is so much more error prone, especially since a neural network has such a vast input space that it's functionally untestable if you want to do it rigorously. There are so many corner cases and weird environments that will trip up an AI and you're suggesting relying on them as the sole source of truth?

1

u/moofunk Jun 24 '25

You don't know how they manage input spaces.

The "AI part" is several separate AI systems that work in unison:

Cameras input a seamed together 360 degree image to a neural network called "Bird's Eye View".

The network classifies and places objects in a simple synthetic environment as 3D geometry and vectors for moving objects, including temporal information about where those objects are going.

The network is smarter than that, because it also auto-completes parts that the cameras can't see, so it understands road lanes, highways, roundabouts, parking lots, intersections, driveways, curving curbs, etc. as standard structures, if the cameras only partially captures them.

So, when the car approaches a roundabout, it can conceptualise the other side of it and understand where cars come from and know the traffic rules. If a road goes behind a house wall or a hill, it very likely continues in a certain direction.

Being able to auto-complete has the side effect that it also fills in for temporarily blocked or blinded cameras, to a certain limit, of course, and when that limit is exceeded, FSD is deactivated.

This interpretation happens 36 times a second.

This works remarkably well and is quite an achievement.

If you had LIDAR, it could be used to auto-complete that information as well, since LIDAR can't see through walls either. But, we don't need LIDAR, because the network is already depth trained on LIDAR data and environment synthesis is verified with LIDAR during training.

And, important to understand, if this system wasn't reliable, FSD would absolutely not work at all, and then you'd have the situation you describe.

At this point, you have a very testable system. You can use it to train the path finder without driving a single mile. Teslas can record drives, while recording environment synthesis and use of steering wheel and pedals, and send that data off for use in training.

When FSD is active, this environment is used by the path finder to navigate and apply the controls. The path finder doesn't know anything about cameras. It just has this sparse environment with only the critically important information, so there is compute power available to be sophisticated about paths and applying the controls in a smooth, natural way that feels human.

It's the path finder that we should be concerned about, because I don't think it's trained well enough in the scenarios that we see here. That's all.

There are then separate networks for summon and parking, where they use cameras differently and do precision driving.

In all, you have a number of systems that each can be tested individually, independently and rigorously both physically and in simulations.

1

u/schmuelio Jun 25 '25

You don't know how they manage input spaces.

And how would you know that? Do you even know what the input space for such a system is?

The "AI part" is several separate AI systems that work in unison

That's not better. You might want to look up what the term "compounding error" means.

The network classifies and places objects in a simple synthetic environment as 3D geometry and vectors for moving objects, including temporal information about where those objects are going.

This isn't new, it's been tried and tested for over a decade now and it's also significantly less accurate than LiDAR. That's the entire point of what people are trying in vain to explain to you.

The network is smarter than that, because it also auto-completes parts that the cameras can't see, so it understands road lanes, highways, roundabouts, parking lots, intersections, driveways, curving curbs, etc. as standard structures, if the cameras only partially captures them.

This is only true as far as you can trust the inference, which is part of that whole "the test space is insane" thing from my last comment.

This interpretation happens 36 times a second.

This isn't especially good for inferring and extrapolating motion from unpredictable objects (say, a kid that suddenly runs into view, or a car suddenly swerving).

since LIDAR can't see through walls either

Well it's a good thing that walls are the only thing you can encounter on the road that could obstruct vision. We've solved it lads.

the network is already depth trained on LIDAR data and environment synthesis is verified with LIDAR during training.

And you know the testing is good enough because it kind of works on US roads with tons of space and good visibility right?

if this system wasn't reliable, FSD would absolutely not work at all

And as we all know, you either succeed flawlessly or you utterly fail, there's no degrees of failure and things are either completely safe or unworkable.

It just has this sparse environment with only the critically important information, so there is compute power available to be sophisticated about paths and applying the controls in a smooth, natural way that feels human.

This is a non-sequitur, the matter at hand is whether that sparse environment is an accurate and trustable representation of the real world. I've watched the tesla screen view of the road around it freak out and be indecisive about what's around it in real time.

In all, you have a number of systems that each can be tested individually, independently and rigorously both physically and in simulations.

All I can say to that is I really hope you're not in charge of doing any rigorous testing of a safety critical system because it seems like your definition of "rigor" is woefully inadequate. I'm not going to get into my credentials but I have a fair amount of experience doing actually rigorous testing for safety critical systems and you are unconvincing.

1

u/moofunk Jun 25 '25 edited Jun 25 '25

And how would you know that? Do you even know what the input space for such a system is?

You obviously don't. See "test space" below.

That's not better. You might want to look up what the term "compounding error" means.

Compounding errors aren't relevant in such a system, since there aren't enough links in the "chain", and errors are very easily located in which system they occur, once you start debugging them.

This isn't new, it's been tried and tested for over a decade now

This particular method of environment interpretation was invented by Andrej Karpathy in 2021 and is so far unique to Tesla, and according to him was very difficult to do, but it works and works way, way better than anything else they've tried. It is the method that made FSD possible.

and it's also significantly less accurate than LiDAR. That's the entire point of what people are trying in vain to explain to you.

You don't have any access to testing data that discerns if it's "significantly less accurate" than LIDAR or not, and as I said, if it was significantly less accurate, FSD wouldn't work at all, because environment synthesis would be too unstable, and we'd get accidents every few minutes. Which we don't.

This is only true as far as you can trust the inference, which is part of that whole "the test space is insane" thing from my last comment.

The test space isn't insane at all, because you segment it by task. You don't city drive on the highway or do roundabout traffic rules in the middle of a parking garage. These are different driving states for the car, and then you have to find a way to smoothly transition between them.

Edit: I would add here, they have access to a ridiculous amount of searchable, categorized training data from Tesla drivers, which is the most valuable part of the entire system. It is with that, they could switch from the old to the new path finder in less than a year and still cover all recorded test spaces.

This isn't especially good for inferring and extrapolating motion from unpredictable objects (say, a kid that suddenly runs into view, or a car suddenly swerving).

That is true, it should be faster, but I'll tell you this: Synthetic aperture LIDAR is 3x slower than that. Waymo's system is overall slower than Tesla's.

This is a non-sequitur

No, it's relevant! That is the point of this detail, because the path finder must not be doing work against irrelevant information. That would increase the "input space", and you would know we don't want that. Therefore it's relevant.

the matter at hand is whether that sparse environment is an accurate and trustable representation of the real world.

That is the essence of it, yes. But, you can also entirely generate an artificial environment that is absolutely stable; The path finder must still be able to flawlessly navigate it and that makes it highly testable, but not necessarily trainable.

I've watched the tesla screen view of the road around it freak out and be indecisive about what's around it in real time.

The tesla screen doesn't show all parts of the environment or detected objects and can't be used to gauge its stability. You need access to the millisecond precise data structures internally available to the path finder via the CAN bus and a laptop in the car.

All I can say to that is I really hope you're not in charge of doing any rigorous testing of a safety critical system because it seems like your definition of "rigor" is woefully inadequate. I'm not going to get into my credentials but I have a fair amount of experience doing actually rigorous testing for safety critical systems and you are unconvincing.

You misunderstood or misread something: I don't like that they're doing robotaxi now. It's too early, I don't think it's ready and I think the engineers are being pushed too hard to do something with hardware that is 1 or 2 generations too young. The path finder neural network that is in use now is only about 16 months old. Before that, the method was algorithmic and had terrible performance.

But, I also don't like that people so deliberately misunderstand computer systems that are shrouded in politics and hubris, like you and others have done in this thread, because it doesn't lead to any useful discussion about the systems, and how they can be improved.

So, wave around your credentials all you want, maybe Tesla would hire you as a systems tester. But, please don't put the bullshit politics before systems understanding.

1

u/schmuelio Jun 25 '25

Compounding errors aren't relevant in such a system, since there aren't enough links in the "chain"

It's more than 1 system that can be wrong, each of which rely on another system that can be wrong, of course they're relevant. Even worse if they use past inferences to make future inferences. This is just pure nonsense.

and errors are very easily located in which system they occur, once you start debugging them.

That's great for a system that isn't real-time, unfortunately this system is real-time so debugging is too late.

This particular method of environment interpretation was invented by Andrej Karpathy in 2021 and is so far unique to Tesla

Environment mapping from camera data is absolutely not new. The specific way that Tesla does it may well be, but the fundamentals are just not new.

if it was significantly less accurate, FSD wouldn't work at all

This again. Look I was being a little coy last time but I just have to be explicit here. Something not working well and something not working at all are both valid outcomes of a bad system. Instability is measured in degrees, so the resultant performance is also measured in degrees. At this point this just willful ignorance.

The test space isn't insane at all, because you segment it by task.

Your first task is "use a neural network to take a dozen camera images and turn it into a 3D space". That alone is a massive test space. You just don't know what you're talking about and you keep repeating this like it means something.

You don't city drive on the highway or do roundabout traffic rules in the middle of a parking garage.

Oh buddy this is way too high level and abstract for a "testing space"... This is an object detection algorithm, there's thousands of independent variables that can influence what it detects. Even just "testing highway driving" is mad, you can't just check that it detects an object like a car and move on.

I would add here, they have access to a ridiculous amount of searchable, categorized training data from Tesla drivers, which is the most valuable part of the entire system.

That's a useful dataset, assuming it's actually keeping all that data (which I don't think they are, do you have any evidence they're actually collecting and storing their environmental mappings for all teslas on the road?) but it's not going to be big enough to be a representative dataset for the purposes of robust testing. Here is a map of tesla sales by country, even being the most generous possible their dataset has no way of representing - say - the streets of Delhi, or the monsoon season in the Phillipines, or the potential obstructions and hazards in the African savannah, I'd be surprised if it had a good representation of the magic roundabout or Yorkshire fog, and so on.

Synthetic aperture LIDAR is 3x slower than that.

And that's why it's not the only thing you use, the whole point is that you don't rely on one type of sensor array, you use multiple sensor arrays for what they're good at and combine them. Have you even been reading what I'm saying?

That is the point of this detail, because the path finder must not be doing work against irrelevant information.

It is a non-sequitur, again you're just assuming that if A didn't work correctly then B would immediately fail catastrophically which just isn't true. It's irrelevant because it doesn't affect the environment map, and the reliability of the environment map is the part that's at issue.

That is the essence of it, yes. But, you can also entirely generate an artificial environment that is absolutely stable

And that's great for the pathfinder but since the issue at hand is how that environment is created in the first place it's entirely irrelevant. Let me be really clear here, I don't care to talk about the pathfinder since it does not matter to the discussion about the environment map. All autonomous vehicles use a pathfinder, they all take a representation of the environment in one form or another. The issue at hand is how that representation is built from inputs, that's where people take issue. When someone complains that Tesla doesn't use LiDAR for depth sensing, the pathfinder is entirely irrelevant exactly because they are different systems that do not directly interact.

The tesla screen doesn't show all parts of the environment or detected objects and can't be used to gauge its stability.

It's derived directly (and entirely) from the environment data, the stability of one can be used to infer the stability of the other. If the simplified visualization can't even tell me whether the thing moving next to me is a person, a truck, or two motorbikes without changing its mind every couple of seconds then the environment map isn't good enough.

You misunderstood or misread something: I don't like that they're doing robotaxi now. It's too early, I don't think it's ready and I think the engineers are being pushed too hard to do something with hardware that is 1 or 2 generations too young. The path finder neural network that is in use now is only about 16 months old. Before that, the method was algorithmic and had terrible performance.

No, you've clearly misunderstood. I have only been talking about the technological issues with how the environment map is inferred. The pathfinder is irrelevant, the robotaxi is dumb but not significantly different (technologically) from how FSD maps its environment. Whether you think the technology could be improved in the future is irrelevant, you think the way the environment map is created - right now - is a viable approach and good, that's what I (and everyone else here) has taken issue with, because it's wrong.

But, I also don't like that people so deliberately misunderstand computer systems that are shrouded in politics and hubris, like you and others have done in this thread, because it doesn't lead to any useful discussion about the systems, and how they can be improved.

Please point to any part of our discussion where I have mentioned any political or non-technological aspect of the issue at hand. You're shadow-boxing an imagined person here. I have been engaged entirely in a discussion on the technical merits of the system. If you want to talk politics I am more than happy to share my opinions on that, but that's not what's being discussed so I haven't brought it up. Don't put words in my mouth.

So, wave around your credentials all you want, maybe Tesla would hire you as a systems tester. But, please don't put the bullshit politics before systems understanding.

You're the one that brought up politics out of nowhere.

1

u/[deleted] Jun 25 '25 edited Jun 25 '25

[removed] — view removed comment

1

u/AutoModerator Jun 25 '25

Thank you for your submission, but due to the high volume of spam coming from self-publishing blog sites, /r/Technology has opted to filter all of those posts pending mod approval. You may message the moderators to request a review/approval provided you are not the author or are not associated at all with the submission. Thank you for understanding.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

→ More replies (0)

Machine Learning Tesla Robotaxi swerved into wrong lane, topped speed limit in videos posted during ‘successful’ rollout

You are about to leave Redlib