Simultaneous Localization and Mapping (SLAM) for AI Self-Driving Cars

By Lance Eliot, the AI Trends Insider

Where am I?

I know that I am standing in a dark room. It is so dark that I cannot see anything at all. I was led into the room while wearing a blindfold. Once inside the room, I was walked around in a circuitous manner, making numerous loops, and also being turned around and around. This was intended to disorient me such that I would not have any sense of north or south, nor any sense of how large the room was, etc. The person that led me into the room was able to sneak out without revealing which direction they went. I had no idea where the door into the room was.

At this point, all I had was my sense of touch available. My eyes could not see anything because of the utter darkness. My ears would not do me much good, since there wasn’t anything in here that was making any noise. I could speak or yell, not due to wanting help, but in hopes of using my own voice as a kind of echo locater. Perhaps my sounds would bounce off of objects in the room. Well, after trying this, I decided that apparently, I am not a bat. I don’t seem to have much of an ability to echo locate objects.

I stretched out my arms. I gingerly began to move my feet. I decided that I should shuffle forward, very slowly, an inch at a time. There could be small objects laying on the floor. There could be large objects sitting in the room. If I had just started walking forward with abandon, I likely would collide with an object, possibly hurting me or damaging it. As I proceeded forward, I wondered how I would know if I had double backed on my own steps. In other words, I might think I am walking straight ahead, but maybe start to curve, and ultimately make a circle, which would bring me back to where I had started. It was so dark that I wouldn’t know that I had done so.

I know that this all sounds kind of crazy. Though, if you are familiar with Escape Rooms, you might not think that this is very unusual. Escape Rooms have become popular recently. They are specially prepared rooms that you are placed into, for fun, and you need to solve various riddles and puzzles to find your way out. These Escape Rooms are usually done on a timed basis, and often done with a group of people. It can be a fun party-like experience. There is often a theme, such as you are in a bank vault and need to get out, or you are in a locked-room murder mystery and need to be like Sherlock Holmes to find clues, solve the murder, and escape from the room.

But that’s not what I was doing.

Instead, I was pretending to be a robot.

Huh? You might wonder, what is Lance saying. He’s become a robot? No, I said that I was pretending to be a robot.

There has been a great deal of study done of how robots can make sense of their environment. If a robot is placed into the middle of a room, and it has not been previously programmed with the particulars of the room and the objects in the room, it needs to have some means to figure out the layout of the room and the objects in the room. It is similar to a human being placed into a dark room that has no prior indication of what is in the room. Furthermore, the human doesn’t even know the particulars of the room, such as how large it is, how many walls there are, and so on. That’s the way a robot might be, since it might not have any previous indication of what the room is about.

I guess you could say it is a version of Escape Room for robots. Say, do you think the robot has fun while it tries to figure out the room?

Anyway, the robot would need to navigate in the room and gradually figure out as much as it can about the room and the objects in the room. This is a well-known and much studied problem in AI, robotics, and computer science, and it is more commonly referred to as SLAM, which is an acronym for Simultaneous Localization And Mapping.

Some key aspects are that the robot does not know where it is. It doesn’t know if it is in the middle of the room, or in a corner, or wherever. It does not know what objects are in the room. It must navigate around, trying to figure out the room, doing so as it moves around. Unlike my example of being in the dark, we can allow the robot to have various sensory capabilities including vision, sonar, radar, LIDAR, and the like, and the room can be lit (there are various scenarios, some would be when the room is lit and the vision can see things, other circumstances might be when vision is not usable but other sensors are).

The robot is welcome to use its sensory devices as it so wishes to do so. We aren’t going to put any constraints on the sensors that it has, though it only has whatever sensors have been provided. The conditions in the room might limit the efficacy of any given sensor at any given moment. If the room is dark, the vision sensor might not be providing much help. If the objects in the room tend to absorb sound waves, then it could be that the sonar won’t do much good. And so on.

The robot must also have some kind of ability to move, which might be done by rollers rather than walking around the room. It doesn’t matter much how it is able to move, but the crucial aspect is that it can and does move. By moving, it explores the environment. Without moving around, it would only be able to “see” whatever it can detect from its present location. By moving around, it will be able to go around and behind objects, and get a better sense of where things are in the room. Imagine if you were in a crowded room and had to stand stationary, and were asked to tell what else is in the room – it might be very hard to see over a large bookcase and know what is on the other side, without actually walking over to see.

What we want to do is have the robot be able to use its sensors, move around, ascertain the nature and shape of the room, and the nature and location of the objects in the room. Since it does not know where in the room it is, the robot must also figure out where it is. In other words, beyond just mapping what is around it, ultimately the robot needs to determine where it is. Figuring this out might not be very immediate. The robot might need to wander around for a while, and eventually after it has crafted a map of the environment, it could ascertain where it is in the environment.

Now that you hopefully comprehend the nature of the problem, I ask you to provide me with an algorithm that would be able to allow the robot to ultimately map out its environment and know where it is.

There are many that have tried to craft such an algorithm. SLAM is one of the most studied aspects of robotics, and particularly for autonomous vehicles. If we were to send an autonomous vehicle up to the planet Mars, we would want it to be able to do SLAM. On a more day-to-day basis, suppose we want to send a robot into a house where a gunman is holed-up. The robot needs to use SLAM to figure out the layout of the house and get to the gunman. You can imagine the look on the gunman’s face when they see the robot navigating around the room. Think of this as an advanced version of Roomba, the popular robotic vacuum cleaner.

I suppose you are already drafting on a napkin the algorithm that you would devise for the SLAM problem. Well, I hate to tell you this, but it turns out there are lots of possible algorithms. The thing is, we don’t want just any algorithm, we want one that is reasonably tractable in computational time and that can be presumably relatively efficient in what it does. If your algorithm would take computers twenty years nonstop to figure out the layout of the room, it probably is not an algorithm we would find very useful.

To deal with the computational aspects, we’ll allow that the SLAM can be an approximator. This means that rather than seeking true perfection about the determined layout and locations, we’ll be okay with the aspect that there will be probabilities involved. So, rather than my saying that I am standing at an exact latitude and longitude down to the millimeter, if you can say that I am likely within a specific square area within the room, it might be sufficient for purposes of being able to navigate throughout the room. We’ll allow uncertainties for the sake of brevity and quickness.

Consider that with SLAM you are able to have some set of sensory observations, we’ll call it O. These are collected sensory aspects that have happened at discrete time steps, we’ll call it T. You are to create a map of the environment, we’ll call it M, and be able to indicate the location of the robot, which we’ll call X.

One of the more popular ways to solve this problem involves the use of Grid Maps. Imagine a large grid, consisting of cells, and into each cell we are going to mark some aspect about the environment, such as whether there is an object in that cell position of not. The grid becomes a kind of topological map of the environment. We can update the grid as the robot moves around, improving the grid indications of what it has found so far. Particular objects are considered to be landmarks. If we don’t have any readily available landmarks, the problem becomes somewhat harder and we can only use raw data per se. The sensors are crucial since the less of them we have, and the less they each can detect aspects of the room, the less we have to go on for making the grid.

Here’s some of the processing involved in SLAM:

Landmark detection and extraction

Data association with other found objects

State estimation of the robot

State update of the robot position

Landmark updates as we proceed

The simpler version of SLAM involves only considering 2D motion. Thus, this would be a robot confined to moving around on the ground and unable to jump up or fly. The harder version of SLAM involves dealing with 3D motion, such as a drone that can fly and so it has not only the 2D aspects but also the added third dimension too.

One of the most popular algorithms used in SLAM is the Extended Kalman Filter (EKF). It is a mathematical way to keep track of the uncertainties of the robot position and the uncertainties of the environmental layout of objects.

For much of the existing research on SLAM, there are some heavy constraints used. For example, one such constraint is that individual landmarks or objects much be distinguishable from each other. This prevents the robot from getting confused by thinking that object A is over here and over there, at the same time, when in actuality it is object A in location Q and it is object B over in location R. There is often a constraint that the landmarks or objects must be stationary and cannot move. Even in spite of these constraints, the algorithm must also be able to try and figure out if it has associated a landmark or object with the wrong thing, in other words it must try to prevent itself from confusing object A with object B.

A breakthrough in figuring out SLAM occurred when there was a realization that the correlations between various identified landmarks or objects was helpful rather than either unimportant or hurtful to the process. Many researchers trace this realization to a 1995 mobile robotics research paper that showed there was a convergence possible due to the use of the correlations.

For robots that are primarily using vision, there are variants of SLAM known as V-SLAM. The letter V stands for Visual. Along with this, there is Visual Odometry (VO). A robot might normally keep track of its odometry or position by the trajectory of its wheels and the distance traveled. Some instead prefer to use a camera that takes frames and a comparison between the frames helps determine the movement of the robot. There’s an interesting article on the semantic segmentation-aided visual odometry for urban autonomous driving in the September-October 2017 issue of the Journal of Advanced Robotic Systems, which you should take a look at if you want to know more about VO and V-SLAM.

Which brings us to this – what does SLAM have to do with self-driving cars?

At the Cybernetic Self-Driving Car Institute, we are making use of SLAM to aid the AI of self-driving cars.

You might at first glance think that it is obvious that SLAM should be used for self-driving cars. Well, not exactly so.

Remember that SLAM is predicated on the notion that the robot does not know where it is and does not know the nature of the environment that it is in.

For most self-driving cars, they are normally going to have a GPS and IMU in order to figure out where they are positioned. They don’t need to guess. They can use their instruments to find out where they are.

In terms of the driving environment, self-driving cars are going to usually have an extensive detailed map provided to them. For example, the Waymo self-driving cars in Arizona are being situated in a locale that has been mapped over and over by Google. The engineers have meticulously mapped out the streets and objects, doing so such that the self-driving car has the map already handed to it. They are also geo-fencing the self-driving cars, preventing the self-driving cars from going outside the boundaries of the already mapped surroundings.

Yikes! You are probably wondering why in the world I dragged you through this whole SLAM thing. If the self-driving car will already know its position, due to GPS and IMU, and if the self-driving car will know its environment in terms of layout and objects by an extensive pre-mapping effort, it would seem that SLAM has no purpose here. Case closed. No SLAM, none of the time.

Not so fast!

We are aiming to have true Level 5 self-driving cars. These are self-driving cars that can drive in whatever manner a human could drive. I ask you, does a human driver always know in-advance the surroundings of where the human driver is driving? No, of course they don’t. Does the human driver always know where the car is? Not for sure, sometimes, such as when you get lost in a neighborhood that you’ve not driven in before.

Now, I realize that you could counter-argue that with GPS, there should never be a circumstance of a human driver that doesn’t know where they are. This is somewhat true and somewhat untrue. Suppose the GPS goes down or is unavailable? I find that when I am in downtown San Francisco, for example, the GPS often loses signal and it does not know where I am on the map. This can be due to blockage of the GPS signals via tall buildings or for atmospheric reasons.

One might also say it is arguable that we will have all places mapped a priori. Admittedly, there is a lot of mapping taking place by Google and others. Furthermore, once self-driving cars become more prevalent, if they share their collected info then we’ll gradually have more and more maps available by the collective will of all those self-driving cars.

But, still, there are going to be places where you’d like to go with your self-driving car and that are not yet mapped and that might not have reliable GPS. SLAM can come to rescue in those instances. And, where SLAM will likely especially be helpful is with autonomous bicycles. Yes, imagine a bicycle that does what a self-driving car does, but that it is a bicycle. A bicycle can go to a lot of places that a car cannot go. Bicycles can go through narrow spaces and go on roads that a car cannot go onto. A great exploration of the use of SLAM for bicycles is described in an article on simultaneous localization and mapping for autonomous bicycles that appeared in the May-June 2017 issue of the International Journal of Advanced Robotic Systems.

Simply put, any AI self-driving car developer should know about and be versed in SLAM.

Even if your specialty related to AI self-driving cars is some other aspect of the system, knowing the capabilities and limits of SLAM would be helpful to your endeavors. We will continue to need SLAM, and especially when self-driving cars try to go outside of carefully orchestrated geo-fenced populated areas. In some respects, a small town such as say the town of Pinole is about the same as driving on Mars. Once we get to Mars, I suppose we can just ask the Martians for directions, but in the meantime, lets continue pushing forward on SLAM.

This content originaly posted on AI Trends.