Installing Experience into Artificially Intelligent Agents

Hi everyone, as a result of my desperate attempts to kick-off my life as some Artificial Intelligence (AI) researcher I present to you my most recent findings and endeavors on the lines of creating better AI. All views are mine and subjective. And please let me know if there is any room for improvement in any way. I need a lot of those.

Reinforcement Learning (RL) and Deep Learning (DL) are out of many, two successful endeavors to create smarter machines. RL allows an AI agent to make sound decisions based on the mistakes it has made in its previous attempts to do a task. DL is the most capable way of estimating the right output given some inputs and additionally saves us from the bother of explicitly telling the AI agents what to see and ignore. Recent developments in Artificial Intelligence (AI), especially onward from the successful integration of Reinforcement Learning (RL) and Deep Learning (DL) have made our machines smarter and more versatile than they ever were. Their amalgamation has achieved amazing feats on tasks like speech recognition, vision, and drug discovery, a few of whose mechanisms are elusive even to humans. Phew…. so finally I can have bots to work for me at my disposal so that I can sleep the entire day? A big NO. But why? Let us examine what is perhaps the most challenging technical problem (subjectively) among many others, preventing AI from being pervasive.

Even though our most capable AI algorithms have bestowed unprecedented levels of learning capability to our AI agents and hence defeated humans on many fronts, they are still confined to simulators only and are far from being easily adapted to real world tasks. The primary reasons behind it are that these algorithms are both task-specific and need millions of samples to learn something substantial which straightaway makes them unfeasible to solve real world problems. As a result, training an AI agent takes forever and fail even when there is a minor change in the environment. Imagine a robotic cooker that would make a mess of your kitchen a million times before finally being able to cook food for you as good as any human chef or a self-driving car crashing and running over people around a million times before finally being able to drive as good as you. In that case, we might as well have to devise ways to put us to cryo-sleep for a thousand years before finally being able to see such bots in action efficiently (This was a joke by the way). So, does that mean we should flush the entire dream and stick to the soon to be unchanging, unexciting monotonous living? Fortunately for me, and particularly for my thesis, there seems to be light at the end of the tunnel.

Still Waiting
Our superhuman AI algorithms are data hungry
One solution to this problem is to devise learning that is both quicker and encompassing of a broad range of tasks. This can be done by creating models that can not only capture similarities across different tasks but also use them to elicit actions that it knows to work well on those similarities. For example(see figure), Pong and Breakout are two different tasks but share the similarity of trying not to let the ball go past the pad.

Such similarities no matter how small exist in a wide range of tasks, from a robot loading a truck to a robot cooking food for you.

Two critical aspects of this solution are the ability to figure out the similarity between different problems, and the ability to evoke appropriate learned mechanism based on that similarity. One computational architecture called LSTM (Long Short Term Memory) which when trained in a special way has yielded encouraging results on simple tasks. LSTM has the capability to retain crucial pieces of structural information of the incoming inputs. Based on the knowledge it has already learned, LSTM can also elicit appropriate responses to a given task on based on the similarity it finds on a given task. An easier way of understanding this approach is considering it to be consisting of two sub-systems just like the way human beings use their experience to take effective actions on situations it has not seen before. The lower level system learns similarity in a task with the learning already made just like humans use experience and the upper system fine-tunes the entire system to adapt quickly.

This work is still at its nascent stage and scaling it to the complex tasks will be challenging. There is a lack of generalized framework around this approach which is restricting its quick experimentation on complex tasks that involves variation in inputs in unexpected ways. Development of complementary techniques like Attention Models and Neural Turing Machines might be used to work in tandem to scale this approach to complex tasks. Also, not to mention the impetus this approach has given to other pieces of promising techniques like imitation learning [One-shot Imitation Learning] where the AI agent learns only the expert moves from the best actions. The minor tweaks made to these pieces to fit in with each other also been encouraging so far. Also, there is a need to develop a formalized version of ‘confidence’ of this approach which essentially means the quantitative amount of trust that can be put in the approach to solve any task.  A lack of such a formalism will be a limiting factor to its deployment to solve real world tasks, especially where failure can’t be tolerated and can be catastrophic.

The outcome of this research to adapt rapidly to new tasks will allow to take out these smart algorithms from mere simulators to solve the actual complicated real world tasks such as housekeeping bots. We’ll able to delegate a lot of our day to day complex yet mundane tasks, saving us our invaluable time and money which we can in turn put into solving pressing issues and driving innovations or even sleep off the entire day like I do.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s