Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize them to objects with different shapes and sizes. For example, having learned to place a toy inside a basket, a robot should be able to generalize this concept using a spoon and a cup. This requires a robot to have the flexibility to learn arbitrary relations in a lifelong manner, making it challenging for an expert to pre-program it with sufficient knowledge to do so beforehand. To equip the robots with the ability to handle arbitrary spatial relations and generalize them to new objects, we propose a novel approach based on distance metric learning. To learn more, see the Technical Approach section.

Our approach uses demonstrations of a relation given by non-expert teachers (top row) in order to generalize this relation to new objects (bottom row).

Metric Learning for Generalizing Spatial Relations to New Objects

We address the problem of learning
spatial relations by introducing a novel method from the
perspective of distance metric learning. Leveraging metric learning methods allows the robot to reason about how
similar two arbitrary relations are to each other. By doing so, we formulate the problem
of reproducing a relation using two new objects as one of minimizing the
distance between the reproduced relation and the teacher demonstrations. More
importantly, our approach enables the robot to use a few teacher
demonstrations as queries for retrieving *similar* relations it has seen before,
thereby leveraging prior knowledge to bootstrap imitating the new relation and paving the way for a lifelong learning scenario.
Therefore, rather than learning a finite set of individual relation models,
our method enables reasoning on a continuous spectrum of relations.
The approach is being developed by members from the Autonomous Intelligent Systems group at the University of Freiburg, Germany.

Overview of our interactive approach for learning to generalize a new spatial relation

Given a small number of demonstrations by a teacher and two objects in the test scene , we aim to compute a pose transformation in order to imitate the demonstrated relation and generalize the intention of the teacher. Our approach uses a novel descriptor to encodes pairwise spatial relations based only on the object geometries and their relative pose. Leveraging previous relations and a prior distance metric enables the robot to consider whether its previous knowledge is sufficient to reproduce the new relation or not. Finally we use sample-based pose optimization to solve a multi-objective optimization problem, in order to imitate the demonstrated relation. In our IROS17 paper, we present an extensive evaluation of our approach based on real-world data we gathered from different user demonstrations.

Demo video of our approach running on a PR-2 mobile robot.

Optimization Beyond the Convolution: Generalizing Spatial Relations with End-to-End Metric Learning

In this work, we propose a novel, neural network-based approach to generalize spatial relations from
the perspective of distance metric learning. We use a variation of the siamese architecture
to train a convolutional neural network as a function that maps an input point cloud
of a scene consisting of two objects to the feature space such that the Euclidean distance between
points in that space captures the similarity between the spatial relations in the corresponding scenes.

Furthermore, to generalize spatial relations in an end-to-end manner, we introduce a novel, gradient
descent-based approach that leverages the learned distance metric to optimize the 3D poses of two
objects in a scene in order to imitate an arbitrary relation between two other objects in a reference
scene. For this, we minimize the metric distance at test time by leveraging the gradient of the metric function.
That is, we backpropagate beyond the first convolution layer to optimize the translation and rotation of
the point clouds of objects.

Overview of our approach. During training, the network learned a metric function for point clouds
that captures the similarity of spatial relations. Optimizing the distance of two scenes with respect
to the transform of the objects, allows to generalize a reference relation with different objects.

Video explaining the key concept and showing real-world experiments.

The Freiburg Spatial Relations dataset features 546 scenes each containing two out of 25 household objects. The
depicted spatial relations can roughly be described as *on top, on top on the corner,
inside, inside and inclined, next to, * and * inclined*.

The dataset contains the 25 object models as textured .obj and .dae files, a low resolution .dae version for
visualization in rviz, a scene description file containing the translation and rotation of the objects for each scene, a file with labels for each scene, the 15 splits used for cross validation, and a bash script to convert the models to pointclouds.

Please cite our work if you use the Freiburg Spatial Relations Dataset or report results based on it.

```
@InProceedings{mees17iros,
author = {Oier Mees and Nichola Abdo and Mladen Mazuran and Wolfram Burgard},
title = {Metric Learning for Generalizing Spatial Relations to New Objects},
booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},
year = 2017,
address = {Vancouver, Canada},
url = {http://ais.informatik.uni-freiburg.de/publications/papers/mees17iros.pdf},
}
```

This dataset is is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and provided for research purposes only. Any commercial use is prohibited. If you use the dataset in an academic context, please consider citing our paper.