BB8, Dum-E, Wall-E, C-3PO, R2-D2, Optimus Prime, you name it, I don’t think there is anyone who hasn’t been captivated by these robots. After all, who does not want a friend that can assist us to commute while we lay back and lift our boxes while we move out? Ever since I started learning machine learning at McGill University, I have always been captivated by the prophecy about autonomous robots that will free us from our mundane chores like driving and housekeeping. The success of Neural Network (NN)-based end-to-end learning techniques in NLP and computer vision further whetted my fascination with them. However, even after pumping of billions together by government, individuals and private entities alike, we still seem to be falling short of turning this prophecy into a reality, i.e. pixel to torque based deep learning systems don’t work (yet) in the real world, which sometimes have made me think if I have deluded myself. This is primarily because we have not been able to build good enough autonomous robots that we can entrust with our lives, and property. And let’s face it, the failure of a few research work to adequately report the controlled conditions under which they generated their results with full transparency has not helped a few other deluding themselves either. However, a tiny ray of hope in my head still believes that my delusion has only to do with not carefully identifying the exact source of problems plaguing a robotic problem at hand. So by asking the right question and crafting a machine learning pipeline that can adequately address that question, we might still be able to develop NN-based autonomous robots. Hence, the purpose of this blogpost is to enlist the intricacies of problems around building NN based software systems for such robots.
Robot Automation through NN based software system doesn’t work (yet):
For a long period of time in the 20th century, rule-based algorithms like physics-based models were perceived to be the best and possibly the only way to build any automation software. While it did succeed to a degree, at some point, everyone realized that it is next to impossible to specify all the rules for an autonomous robot which then led to an AI winter. However, then end to end gradient-learning based NNs came to the spotlight that can learn the rules by itself when provided enough data. More commentary on this rule-based vs data-driven shall come later. For now, let us just stick to the fact that NN based techniques still seem to be the way to go about fulfilling the prophesized robotics automation. As a result, it has already seen some degree of adoption of the end to end training based NNs for all its parts in isolation such as perception, learning, and control. However, it brings about problems that are very specific and unique to its nature. In the next section, I will walk you through the different problems that a NN based robotic software faces.
- Incomplete and uncertain knowledge: a robot system builds a coherent model using inputs from various sensors that it uses to base its learning, decision-making, and control. However, the inputs sometimes are inadequate because of occlusion, partial-observability, noise in the sensors, etc. This impairs the robot to effectively operate in the real world.
- Training-deployment data inconsistency: robots operate in the wild uncontrolled interactive real-world environment which can have a huge state-space. As real-world data/experience is expensive to gather, training data usually don’t span the whole state-space that can appear post-deployment. Also, some SOTA deep learning techniques, especially in computer vision don’t fit well for the robot’s needs. For example, there is a flaw with class-based object recognition as unlike object types in curated datasets, object types in the real-world is a long-tailed distribution.
- Data-inefficiency: Reinforcement Learning (RL) is a learning paradigm that a robot can use to learn in the real-world using trail and error. However, it is data-inefficient and it’s expensive to generate enough high-quality data/experience.
- Exploration: One of the promising ways to train robots is RL which needs a robot to explore. Exploration is a random process that can run into dangerous states. While they are doable in the simulators, real-world robots don’t have that luxury.
- Reward specification: specifying what behavior should a robot reinforce is a hard task. Multiple experiments have confirmed the robots hacking the reward description to attain their goal which doesn’t align with the spirit with which it was specified in the first place.
- Evaluation: The standard way in which NNs are trained and evaluated is using the summary statistics on non-correlated training samples. However, this is not for the real-robots as the future observations depend on past actions. As a result, a NN system that does well on a curated dataset can underperform post-deployment and even make catastrophic mistakes.
- Interpretability: NNs ability to figure out the features has reduced the need for domain-specific engineering tremendously. However, it does that in a black-box sense of the way that also makes it hard sometimes to figure out why it did something that it did.
- Catastrophic forgetting: a robot must keep updating its knowledge with the new experiences it gathers in the world. However, NNs tend to forget what they previously learned when trained on new data.
- Merging rule-based and data-driven learning: Traditionally model-based learning and pure learning were seen as two distinct types of robot learning. While rule-based systems have some success such as Boston Dynamics Atlas, defining all rules is hard. Data-driven methods lead to fitting to irregularities and noise in the data. Hence a mid-way is warranted. A promising direction is specifying all that you can and fitting only a few parameters. Using explicit models for parameters defining the known structure, constraints, and physical laws can be modeled while the unknowns can be left for the robot to learn by itself. That way a sweet spot can be found between these two extremes amongst the whole spectrum of a possible mix between model-driven and data-driven techniques.
- Simulation: robots while learning has to learn the experiences of a wide variety of situations including potentially dangerous situations to both itself and the world around, not to forget the huge amount of data needs that some of the most potent algorithms have today. Simulation can help get around it by letting the robot warm-start in the real-world. With powerful CPUs and GPUs, they can simulate years of experience in minutes.
- Transfer learning: initial training conditions, be it on simulation (there is always a gap between the simulation and reality) or expert demonstrations usually are different than the deployment situations. Hence a robot must be equipped to distill as much positive knowledge to the novel situations. Transfer learning techniques like Domain Adaptation, Sim2Real, One-shot learning, etc are the way to go about it.
- Uncertainty awareness: Capturing uncertainty in the trustable behavior of a robot when any mistake can be catastrophic is of high value when it comes to autonomous robots. Specifically, the use of a Bayesian framework to capture uncertainty has multiple benefits such as fusing of predictions with prior knowledge, prior model, heuristics and other sensors and mitigating the effect of noise in the dataset. A well-captured epistemic uncertainty can also help to calibrate confidence that can further be used, for example in a sensor fusion framework.
- Lifelong/Continual/Incremental learning: A robot has to keep evolving after being deployed with limited onboard compute and without forgetting what it already knows. This means the robot should be able to identify the useful bits of information, replace the non-useful ones with new incoming data and in some cases, identifying the new classes of data as well. However, this is hard and NN based robots go through a phenomenon called catastrophic forgetting.
- Active Learning: Selecting the most informative samples when the disposal compute resources are limited is what inspires active learning. This further applies to active vision where in order to better uncover semantic regularities, geometric structure, exploit semantics and geometry of a scene the robot is to identify the best location to position itself. This further leads to data-efficiency.
- Novel Evaluation Metrics: unlike other research areas robots need to evaluate the worst case, not the average or the best one. Such a metric should also be able to account. One interesting idea I heard is by Pieter Abbeel here is where he compared how we yet to figure out checkpoints to evaluate robots. The example where a human is only scrutinized for his/her ability to stop at signs, to park, etc to determine the eligibility for granting a driving license and not at all times is something that explains the essence of it.
- Reward design: reward systems for real-robots should be designed for the worst case. Additionally, concepts such as inverse reinforcement learning, preference elicitation, and behavioral cloning are handy techniques to shape these reward functions.
Asking the right questions has always been the hardest part of solving any problem. Our existing machine learning techniques can be effective for real robots if we ask the pertinent question that plaques the problem at hand. For example, Wayve used reinforcement learning to teach a real self-driving car to follow lanes using 11 episodes of training data (there wasn’t a huge exploration space, and offline training was done using expert demonstrations). In this blog post, I have tried to enumerate the problems an automated robotic system might face during its building process so that one can be mindful of the pitfalls while building an autonomous robot system. Also, the software of such a system cannot exist without its hardware. Hardware has to follow along with the rich sensory data and the interactive world. I will talk about the problems around the hardware for robots especially when fused with NNs that can leverage our current machine learning algorithms in the next blogpost.