A robot that learns how to physically interact with people by watching them interact with each other.

Guide: Dr. Faruk Kazi  Build started: May 2015 (Beginning of 4th year/Senior Year)    Time required: 2 Months.

This work was part of my research internship at the VJTI Centre of Excellence and was published at the IEEE SCEECS 2016 conference and can be found here

Robotics and Artificial Intelligence systems are being increasingly incorporated in our every day lives. Since these systems are interacting with humans, they have to be robust enough to deal with variety of uncertain and stochastic situations.  Especially, if we want someday robots to be our companions and help us in our everyday tasks, we need them to understand all our instructions and gestures like another human would do, even the implicit ones. Therefore explicitly programming every task in the robot is next to impossible. This calls for robots which can learn from the environment with little to no human intervention. This inspired us to create a system which can observe humans and learn to do physical tasks especially the ones that involves coordinating with humans beings.


We chose a Humanoid robot for our project as its closest to the human body in terms of the degrees of freedom and anatomy. A Microsoft Kinect V1 sensor is used by the robot to perceive its environment.

Bioloid Robot with Kinect

The Bioloid Premium Robot kit made by Trossen Robotics with Microsoft Kinect.


The robot learns by observing two humans doing Karate

The robot learns by observing two humans who are interacting physically, in this case one person is attacking while the other is defending.

In this project we have considered instructions only through physical gestures and not any other modality. The aim is to give the robot the ability to learn how to react physically to a physical gesture like a punch or if an object is handed to it. Thus it has to learn that physical stimulus and response pairs by watching humans perform it and then respond appropriately when subjected to a similar stimulus.

During the learning phase, two humans are interacting in front of the robot as shown in the above figure. The robot learns this physical stimulus-response pairs through the 3-D data coming from the Kinect camera. The 3-D data consists of the (X,Y,Z) co-ordinates of 22 joints of the two interacting humans.

The body data of the attacker is classified as stimulus and that of the responder is classified as response  and is then compressed using Principal Component Analysis . This stimulus and response data is then fed as the input and the desired output respectively to an Artificial Neural Network. The neural network is trained using supervised learning techniques suitable for its architecture.

We have used two types of neural networks namely Radial Basis Function Networks and Long Short Term Memory Networks. Following actions were taught to the robot:

  1. Namaste: Human initiates Indian namaste gesture
    and the robot reciprocates by performing namaste in
    real time.
  2.  Helping: Human hands over a bottle and the robot
    takes it.
  3.  Boxing: Human initiates a upper head punch and the
    robot defends itself.
  4. High-five: Human expects a high five from the robot,
    while the robot initiates high five gesture.
  5. Idle position: Human stands in idle position and the
    robot expects for gestural movement.

All the above gestures are learnt by watching humans perform it. Once the training is done the robot can be subjected to stimuli like punching and it can respond with enough generality.

The following images encapsulate the entire procedure of training/learning and execution. The complete analysis of the performance of this technique can be found in the paper.

The flow of information during learning

The flow of information during learning

The flow of information after training is done and during execution i.e during human robot interaction.

The flow of information after training is done and during execution i.e during human robot interaction.

A video of the simulation of this project in Unity 3D can be seen below:

My friend and colleague Abhishek Sawarkar has worked hard and contributed equally in the creation of this project.