Yikes!
In the recent AI day talks, in response to a question about whether they train the FSD for country specific, Elon commented that the basics of driving are the same everywhere in the world - first and fore most, don't hit anything. OK, this isn't a hit, but the pedestrian clearly acted as though they thought it might be! The guys was already a few steps onto the crossing before the visualisation even seemed to have seen that he was there.
https://youtu.be/xbw2ZB0Lbk8?t=350 Also generally, the representation of pedestrians in the FSD 10 is alarming at the best of times. It doesn't seem to even be able to determine which direction people are facing! Which is fairly basic stuff if you want to predict their likely behaviour. I seem to recall Waymo videos from several years ago, where Waymo back then were considering the behaviour of the entities identified (in fact, here it is ...
https://youtu.be/tiwVMrTLUWg?t=621 )...
But at the same time they also demonstrated how their cars were still able to suitably respond to completely novel situations as well... like this example where their cars encountered a woman going round in circles in the middle of the road on a mobility scooter chasing a duck ...
https://youtu.be/tiwVMrTLUWg?t=663 .
Which is a good segway back to Tesla FSD 10...
This one I can't fathom at all... where on earth did the Tesla think it was going... irrespective whether it could read the sign or not! It wasn't recognising any obstruction on the visualisation as far as I can see...
https://youtu.be/po6nG5vY_ec?t=316It might not be a woman on a mobility scooter going round in circles chasing a duck in the road, but surely the Tesla can see that the road is obstructed with
something that it shouldn't hit?! And even if it can't read the sign, it just seems to have completely drawn a blank, and doesn't seem to have a suitable 'behaviour' for how to deal with this.
I mean, OK, it isn't going to recognise everything out there, but the Waymo team have made a point that they always categorise every pixel in every frame. Now that categorisation might be 'unknown', but they can tie the image in with the lidar to establish that it is 'something' and at the very least not to hit it, and most likely to recognise that it is probably an obstruction.
I get the impression with the Tesla videos that Tesla FSD has only really been programmed to recognise pre-determined things, and the things it can't recognise are still quite a gap - I mean, in that video above it looks like it was just going to drive straight into that tent.
It's almost as though the neural nets didn't classify it as a known 'thing' so it looks like it might have just assumed that the image of the tent was possibly just shadows or pattern on the road surface or something similar which it thought it could drive over!
And just going back to that second video, it's quite disappointing that before it takes the corner into the closed road, it's waiting for the pedestrians on the crossing... but just watch how it keeps changing its mind whether there's one or two pedestrians there! I can't find it now, but I seem to recall a video from Waymo showing how they were able to detect and track very 'stabily' each individual pedestrian in quite a crowd at a pedestrian crossing. While I couldn't find that video this time, I did come across this from Waymo...
https://blog.waymo.com/2021/08/MostExpe ... river.html ... just look at the detail that the waymo system is able to 'see' in the pedestrians...
The waymo seems to be going much further than just having the AI say "pedestrian" the way Tesla seems to be doing. And I say that I think that is all Tesla seem to be doing because the Tesla half the time isn't even detecting that there's a pedestrian there (see videos above), and even when it is, it often isn't correctly even inferring which way the pedestrian is facing - and they can disappear and reappear even when the car has an unobstructed view of them. Contrast that with the details of the arms and limbs of the pedestrians detected in the waymo videos on that page.
The internal model of the world around the vehicle that the Waymo is able to fuse from the various sensors (yes, that includes lidar) just seems to be light years ahead of Tesla.
Just look at this ...
https://youtu.be/xbw2ZB0Lbk8?t=300 .. the narrator of the video comments how the Telsa didn't seem phased by the motor bike... ahem, cough, what!? ... look at the visualisation.... as the Telsa starts to creep forwards, the Telsa doesn't even recognise that there's a motorcycle nor anything else there at all... that's why it's not fazed by it... it isn't actually seeing it at all! Pretty shocking! And rather disappointing that the driver seems oblivious to this and thinks the Tesla is handling it well!!
I know I've said before many times now (yawn) that I'm not convinced Tesla's architecture is pitched to support the height of 'skyscraper' that they need to be building. ... but that was mainly whether they could do it without lidar, and whether the software architecture is pitched suitably (I have doubts whether it's fusing into a robust enough 'top level' planner or that sort of thing)
But the more I've seeing the Tesla struggle with stability of things around it, with the stability of the road markings, and really struggling with pedestrians... I'm just wondering whether even the cameras are adequate enough as cameras.
I know some people have made a point about the being, eight cameras! wow! But bear in mind, that isn't 8 all looking in the same direction giving redundancy. From what I saw from the AI day videos, those cameras only have partial overlaps with each other, and that number of cameras is about giving 360 degree coverage.
And did someone say each camera is only 1280x960
?
It's getting late and I'll probably regret thinking through this at this time of night, but 1280 x 8 / 360 = only 28 pixels per (horizontal) degree. And that's assuming no overlap at all between cameras, but we know from the AI day talks that they need overlap to be able to identify the same item in different cameras for establishing the depth, etc.
I'm now beginning to wonder if that might be why it's really struggling to recognise how many bins and how many people there are, and why it keeps changing its mind. I just wonder if the resolution is just too low for it to make reliable inferences as to what it's seeing.
If so, then this could be yet another area where I suspect Tesla might find the existing hardware that they've been selling as 'software-only' upgradable to FSD, might fall short of achieving that end.