Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize them to objects with different shapes and sizes. For example, having learned to place a toy inside a basket, a robot should be able to generalize this concept using a spoon and a cup. This requires a robot to have the flexibility to learn arbitrary relations in a lifelong manner, making it challenging for an expert to pre-program it with sufficient knowledge to do so beforehand. To equip the robots with the ability to handle arbitrary spatial relations and generalize them to new objects, we propose a novel approach based on distance metric learning. To learn more, see the Technical Approach section.
Our approach uses demonstrations of a relation given by non-expert teachers (top row) in order to generalize this relation to new objects (bottom row).
Metric Learning for Generalizing Spatial Relations to New Objects
We address the problem of learning spatial relations by introducing a novel method from the perspective of distance metric learning. Leveraging metric learning methods allows the robot to reason about how similar two arbitrary relations are to each other. By doing so, we formulate the problem of reproducing a relation using two new objects as one of minimizing the distance between the reproduced relation and the teacher demonstrations. More importantly, our approach enables the robot to use a few teacher demonstrations as queries for retrieving similar relations it has seen before, thereby leveraging prior knowledge to bootstrap imitating the new relation and paving the way for a lifelong learning scenario. Therefore, rather than learning a finite set of individual relation models, our method enables reasoning on a continuous spectrum of relations. The approach is being developed by members from the Autonomous Intelligent Systems group at the University of Freiburg, Germany.
Given a small number of demonstrations by a teacher and two objects in the test scene , we aim to compute a pose transformation in order to imitate the demonstrated relation and generalize the intention of the teacher. Our approach uses a novel descriptor to encodes pairwise spatial relations based only on the object geometries and their relative pose. Leveraging previous relations and a prior distance metric enables the robot to consider whether its previous knowledge is sufficient to reproduce the new relation or not. Finally we use sample-based pose optimization to solve a multi-objective optimization problem, in order to imitate the demonstrated relation. In our IROS17 paper, we present an extensive evaluation of our approach based on real-world data we gathered from different user demonstrations.
Optimization Beyond the Convolution: Generalizing Spatial Relations with End-to-End Metric Learning
In this work, we propose a novel, neural network-based approach to generalize spatial relations from
the perspective of distance metric learning. We use a variation of the siamese architecture
to train a convolutional neural network as a function that maps an input point cloud
of a scene consisting of two objects to the feature space such that the Euclidean distance between
points in that space captures the similarity between the spatial relations in the corresponding scenes.
Furthermore, to generalize spatial relations in an end-to-end manner, we introduce a novel, gradient
descent-based approach that leverages the learned distance metric to optimize the 3D poses of two
objects in a scene in order to imitate an arbitrary relation between two other objects in a reference
scene. For this, we minimize the metric distance at test time by leveraging the gradient of the metric function.
That is, we backpropagate beyond the first convolution layer to optimize the translation and rotation of
the point clouds of objects.
The Freiburg Spatial Relations dataset features 546 scenes each containing two out of 25 household objects. The
depicted spatial relations can roughly be described as on top, on top on the corner,
inside, inside and inclined, next to, and inclined.
The dataset contains the 25 object models as textured .obj and .dae files, a low resolution .dae version for
visualization in rviz, a scene description file containing the translation and rotation of the objects for each scene, a file with labels for each scene, the 15 splits used for cross validation, and a bash script to convert the models to pointclouds.
Please cite our work if you use the Freiburg Spatial Relations Dataset or report results based on it.
@InProceedings{mees17iros,
author = {Oier Mees and Nichola Abdo and Mladen Mazuran and Wolfram Burgard},
title = {Metric Learning for Generalizing Spatial Relations to New Objects},
booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},
year = 2017,
address = {Vancouver, Canada},
url = {http://ais.informatik.uni-freiburg.de/publications/papers/mees17iros.pdf},
}
This dataset is is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and provided for research purposes only. Any commercial use is prohibited. If you use the dataset in an academic context, please consider citing our paper.