Transferability and Sample-Efficiency in data-driven AI

The problem:

Researchers and practitioners in the pre deep Neural Networks (NN) era struggled in finding ways to capture high-dimensional non-linear relationships. Then there came models that resulted from the amalgamation of the universally approximating NN  with the ability to store huge chunks of data (big-data) and the processing of them with massively parallelized units (GPU) that turned out to be a Sputnik moment for researchers far and near alike. What no one immediately saw coming were the model’s ability to overfit to even the noise in the data it sees and unwarranted extrapolation to unseen regions. These downsides of NN raised questions around whether these NN based models are even learning anything or just memorizing the seen data[1]. What this meant for real applications is that there will be a significant chance of the NN based AI agents breaking down when it sees something that it has not seen before. In other words, they do not transfer naturally. Somehow, we humans have figured this out and do this smoothly. My research motivations emanate from these problems.

The intuitive solution:

A naive solution would be to generate data everywhere in every setting possible, but data is expensive. Many domains have either a very limited amount of data such as in medical applications or number of possible settings are infinite such as in a self-driving car that makes exhaustive data generation impossible. So basically, this problem calls for finding ways to make NN based models adapt across multiple scenarios (transferability) while taking as less training samples as possible (sample-efficiency).

Sample-efficiency means to learn from fewer data. Transferability means to leverage already learned knowledge from one task or domain and apply to another task or domain. In a lot of cases, improvement in transferability also leads to more sample-efficiency. A non-exhaustive set of ways to go about them are finding better priors that works across a multitude of tasks (meta-learning ([5]), transfer in image classification using VGG ([9]), InceptionV3 ([10]) etc), active learning especially when demonstrations are involved, learning modular hierarchies([2], [3], [4]), learning domain-invariant representations([6]), multi-task learning ([7]) etc. Although a lot of progress has been made depending on what kind of inputs (language, images, videos etc) they have been applied on, they still fall short on being effective on a wide spectrum of possible problems.

Past experiences:

My recent work has been around attaining data-efficiency through active learning by identifying satisfactorily transferable and non-transferable tasks using probabilistic methods in an end to end fashion in Learning from Demonstrations framework. This has a lot of utility in areas where demonstrations are painstaking to obtain and hence has to be sought judiciously while ensuring that performance is not compromised. For example, a self-driving car might transfer its knowledge positively to a dry road after training on a wet one but might fail on a slippery icy road.

Future directions:

I am a researcher who likes to see the problems and opportunities in AI through the lens of robotics. As data is usually scarce in robotics which brings about plenty of unseen situations, I find using robotics test-beds ideal for gauging the applicability of whatever I develop as an embodied intelligence. As a result, all my examples and perspectives have a robotics flavour. So, in the future, I look forward to finding ways to better transfer knowledge from simulators to the real-world in the near future. This is important as generating data of any kind on a simulator is cheap. I am also looking forward to applying these techniques through temporal hierarchies and later extend them to language and vision problems. Temporal hierarchies allow for learning in terms of longer time horizons. This will aid to even faster learning and decision-making and provide room to introduce modularity that can be possibly be transferred to a related task efficiently ([8]). For example, how modular sub-policies corresponding to sub-tasks in a hierarchy can be learned and reused as plug and play to another task that has some common sub-tasks.


In conclusion, data-driven AI has the potential to transform the world today, but there is an inherent problem of the lack of availability of exhaustive data over every possible input out there in the wild. This shows up in the form of corner cases which are almost impossible to get rid of. So it is paramount that generalizable features are learnt quickly and safely transferred. Hopefully one day this will help us to make more viable, safer, and friendly robots operating out in the wild, unstructured world around humans.

[1] Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530. 2016 Nov 10.

[2] Bacon PL, Harb J, Precup D. The Option-Critic Architecture. InAAAI 2017 Feb 4 (pp. 1726-1734).

[3] Eysenbach B, Gupta A, Ibarz J, Levine S. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070. 2018 Feb 16.

[4] Andreas J, Klein D, Levine S. Modular multitask reinforcement learning with policy sketches. arXiv preprint arXiv:1611.01796. 2016 Nov 6.

[5] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400. 2017 Mar 9.

[6] Tzeng E, Hoffman J, Saenko K, Darrell T. Adversarial discriminative domain adaptation. InComputer Vision and Pattern Recognition (CVPR) 2017 Jul 1 (Vol. 1, No. 2, p. 4).

[7] Pan SJ, Yang Q. A survey on transfer learning. IEEE Transactions on knowledge and data engineering. 2010 Oct 1;22(10):1345-59.

[8] Andreas, J., Klein, D. and Levine, S., 2017, August. Modular multitask reinforcement learning with policy sketches. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 166-175). JMLR. org.

[9] Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[10] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 1-9).