Generalizing Spatial Relations

Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize them to objects with different shapes and sizes. For example, having learned to place a toy inside a basket, a robot should be able to generalize this concept using a spoon and a cup. This requires a robot to have the flexibility to learn arbitrary relations in a lifelong manner, making it challenging for an expert to pre-program it with sufficient knowledge to do so beforehand. To equip the robots with the ability to handle arbitrary spatial relations and generalize them to new objects, we propose a novel approach based on distance metric learning. To learn more, see the Technical Approach section.

Our approach uses demonstrations of a relation given by non-expert teachers (top row) in order to generalize this relation to new objects (bottom row).

Technical Approach

Metric Learning for Generalizing Spatial Relations to New Objects

We address the problem of learning spatial relations by introducing a novel method from the perspective of distance metric learning. Leveraging metric learning methods allows the robot to reason about how similar two arbitrary relations are to each other. By doing so, we formulate the problem of reproducing a relation using two new objects as one of minimizing the distance between the reproduced relation and the teacher demonstrations. More importantly, our approach enables the robot to use a few teacher demonstrations as queries for retrieving similar relations it has seen before, thereby leveraging prior knowledge to bootstrap imitating the new relation and paving the way for a lifelong learning scenario. Therefore, rather than learning a finite set of individual relation models, our method enables reasoning on a continuous spectrum of relations. The approach is being developed by members from the Autonomous Intelligent Systems group at the University of Freiburg, Germany.

Overview of our interactive approach for learning to generalize a new spatial relation

Given a small number of demonstrations by a teacher and two objects in the test scene , we aim to compute a pose transformation in order to imitate the demonstrated relation and generalize the intention of the teacher. Our approach uses a novel descriptor to encodes pairwise spatial relations based only on the object geometries and their relative pose. Leveraging previous relations and a prior distance metric enables the robot to consider whether its previous knowledge is sufficient to reproduce the new relation or not. Finally we use sample-based pose optimization to solve a multi-objective optimization problem, in order to imitate the demonstrated relation. In our IROS17 paper, we present an extensive evaluation of our approach based on real-world data we gathered from different user demonstrations.

Demo video of our approach running on a PR-2 mobile robot.

Optimization Beyond the Convolution: Generalizing Spatial Relations with End-to-End Metric Learning

In this work, we propose a novel, neural network-based approach to generalize spatial relations from the perspective of distance metric learning. We use a variation of the siamese architecture to train a convolutional neural network as a function that maps an input point cloud of a scene consisting of two objects to the feature space such that the Euclidean distance between points in that space captures the similarity between the spatial relations in the corresponding scenes.
Furthermore, to generalize spatial relations in an end-to-end manner, we introduce a novel, gradient descent-based approach that leverages the learned distance metric to optimize the 3D poses of two objects in a scene in order to imitate an arbitrary relation between two other objects in a reference scene. For this, we minimize the metric distance at test time by leveraging the gradient of the metric function. That is, we backpropagate beyond the first convolution layer to optimize the translation and rotation of the point clouds of objects.

Overview of our approach. During training, the network learned a metric function for point clouds that captures the similarity of spatial relations. Optimizing the distance of two scenes with respect to the transform of the objects, allows to generalize a reference relation with different objects.

Video explaining the key concept and showing real-world experiments.

Freiburg Spatial Relations Dataset

Overview

The Freiburg Spatial Relations dataset features 546 scenes each containing two out of 25 household objects. The depicted spatial relations can roughly be described as on top, on top on the corner, inside, inside and inclined, next to, and inclined.
The dataset contains the 25 object models as textured .obj and .dae files, a low resolution .dae version for visualization in rviz, a scene description file containing the translation and rotation of the objects for each scene, a file with labels for each scene, the 15 splits used for cross validation, and a bash script to convert the models to pointclouds.

BibTeX

Please cite our work if you use the Freiburg Spatial Relations Dataset or report results based on it.

@InProceedings{mees17iros,
  author = {Oier Mees and Nichola Abdo and Mladen Mazuran and Wolfram Burgard},
  title = {Metric Learning for Generalizing Spatial Relations to New Objects},
  booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},
  year = 2017,
  address = {Vancouver, Canada},
  url = {http://ais.informatik.uni-freiburg.de/publications/papers/mees17iros.pdf},
}

License Agreement

This dataset is is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and provided for research purposes only. Any commercial use is prohibited. If you use the dataset in an academic context, please consider citing our paper.

Download

Code

The software implementation of the two papers of this project can be found in the following GitHub and GitHub repositories for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.