Versatility and Sample-Efficiency in data-driven AI

The problem:

Researchers and practitioners in the pre deep Neural Networks (NN) era struggled in finding ways to capture high-dimensional non-linear relationships. Then there came models that resulted from the amalgamation of the universally approximating NN  with the ability to store huge chunks of data (big-data) and the processing of them with massively parallelized units (GPU) that turned out to be a Sputnik moment for researchers far and near alike. What no one immediately saw coming were the model’s ability to overfit to even the noise in the data it sees and unwarranted extrapolation to unseen regions. These downsides of NN raised questions around whether these NN based models are even learning anything or just memorizing the seen data[1]. What this meant for real applications is that there will be a significant chance of the NN based AI agents breaking down when it sees something that it has not seen before, and this is scary.

The intuitive solution:

A naive solution would be to generate data everywhere in every setting possible, but demonstrations are expensive. So basically, this problem calls for finding ways to make NN based models work across multiple settings (versatility) and quickly adaptable (sample-efficiency). I am a researcher who likes to see the problems and opportunities in AI through the lens of robotics. As today’s data-driven AI is almost sure to stumble upon unseen situations, I find using robotics test-beds ideal for gauging the applicability of whatever I develop as an embodied intelligence. As a result, all my examples and perspectives have a robotics flavour.
There have been multiple suggested ways of achieving versatility and sample-efficiency. Subjectively speaking, the potential solutions to the problems mentioned above lie in finding good generalizable representations on trainable settings and transferring them to the settings to be deployed on reliably. A few broad methods for finding good representations being learning modular hierarchies([2], [3], [4]), meta-learning([5]), learning domain-invariant representations([6]), multi-task learning ([7]) etc.  

Past experiences:

My recent projects have involved learning sample-efficiently in Learning from Demonstrations framework on a stream of tasks using Bayesian Neural Networks(BNN). I show that BNNs can be trained in a way that it yields a predictive uncertainty that reflects the generalizability of a given situation in hand given the training it has already done so far. By leveraging such uncertainty, educated decisions can be on when or when not to seek demonstrations.

Future directions:

In the future, I look forward to continuing probing finding transferrable representations that are viable for AI-powered robotics. I would investigate areas of hierarchies, domain-invariant representations, and inverse reinforcement learning applied to Learning from Demonstrations and Reinforcement Learning frameworks. For example, how modular sub-policies corresponding to sub-tasks in a hierarchy can be learned and reused as plug and play to another task that has some common sub-tasks.

Conclusion:

In conclusion, data-driven AI has the potential to transform the world today, but there is an inherent problem of the lack of availability of exhaustive data over every possible input out there in the wild. This shows up in the form of corner cases which are almost impossible to get rid of. So it is paramount that generalizable features are learnt and safely transferred. This is especially indispensable for AI-powered robotics as they usually are deployed in the unstructured world around humans. This has and will continue to guide me in my research.

[1] Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530. 2016 Nov 10.

[2] Bacon PL, Harb J, Precup D. The Option-Critic Architecture. InAAAI 2017 Feb 4 (pp. 1726-1734).

[3] Eysenbach B, Gupta A, Ibarz J, Levine S. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070. 2018 Feb 16.

[4] Andreas J, Klein D, Levine S. Modular multitask reinforcement learning with policy sketches. arXiv preprint arXiv:1611.01796. 2016 Nov 6.

[5] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400. 2017 Mar 9.

[6] Tzeng E, Hoffman J, Saenko K, Darrell T. Adversarial discriminative domain adaptation. InComputer Vision and Pattern Recognition (CVPR) 2017 Jul 1 (Vol. 1, No. 2, p. 4).

[7] Pan SJ, Yang Q. A survey on transfer learning. IEEE Transactions on knowledge and data engineering. 2010 Oct 1;22(10):1345-59.