Mid-level Vision

The classical vision literature distinguishes between low-level, mid-level, and high-level vision processes. The ideas on mid-level vision are due to the Gestalt psychologists, who suggested that the underlying processes are grouping mechanisms, which are essential for separating figure from ground. Certain principles, such as closure, symmetry, or similarity guide how to group pieces of image and locate boundary. While current learning approaches do not pay much attention to these intermediate representations, our idea is that using them explicitely, can facilitate tasks. Furthermore, they can be the glue between low-level and high-level processes, and be modulated by either. We have implemented grouping principles as 2D image operators, developed 3D grouping processes, and applied them in tasks of attention, segmentation and recognition.  

2D Images

Mid-Vison Feedback

Michael Maynord, Eadom Dessalene, Cornelia Fermüller, and Daniel Dementhon,
International Conference on Learning Representations (ICLR), 2023

The paper introduces a novel mechanism to modulate visual object classification based on high level categorical expectations. 

Paper Abstract
Feedback plays a prominent role in biological vision, where perception is modulated based on agents’ evolving expectations and world model. We introduce a novel mechanism which modulates perception based on high level categorical expectations: Mid-Vision Feedback (MVF). MVF associates high level contexts with linear transformations. When a context is ”expected” its associated linear transformation is applied over feature vectors in a mid level of a network. The result is that mid-level network representations are biased towards conformance with high level expectations, improving overall accuracy and contextual consistency. Additionally, during training mid-level feature vectors are biased through introduction of a loss term which increases the distance between feature vectors associated with different contexts. MVF is agnostic as to the source of contextual expectations, and can serve as a mechanism for top down integration of symbolic systems with deep vision architectures. We show the superior performance of MVF to post-hoc filtering for incorporation of contextual knowledge, and show superior performance of configurations using predicted context (when no context is known a priori) over configurations with no context awareness.

The image torque operator: A new tool for mid-level vision

Morimichi Nishigaki, Cornelia Fermüller, and Daniel Dementhon,
IEEE International Conference on Computer Vision (CVPR), 2012

The Torque is an image processing operator that implements the Gestaltist principle of closure. This paper demonstrates the Torque for the applications of attention, boundary detection, and segmentation.  

Paper Abstract Code
Contours are a powerful cue for semantic image understanding. Objects and parts of objects in the image are delineated from their surrounding by closed contours which make up their boundary. In this paper we introduce a new bottom-up visual operator to capture the concept of closed contours, which we call the ’Torque’ operator. Its computation is inspired by the mechanical definition of torque or moment of force, and applied to image edges. The torque operator takes as input edges and computes over regions of different size a measure of how well the edges are aligned to form a closed, convex contour. We explore fundamental properties of this measure and demonstrate that it can be made a useful tool for visual attention, segmentation, and boundary edge detection by verifying its benefits on these applications.

Contour-based recognition

Yong Xu, Yuhui Quan, Zhuming Zhang, Hui Ji, Cornelia Fermüller, Masakatsu Nishigaki, and Daniel Dementhon.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. 

This paper demonstrates the Torque as a useful tool for object recognition. As it uses contours, it is complimentary to commonly used feature descriptors, such as SIFT.

Paper Abstract
Contour is an important cue for object recognition. In this paper, built upon the concept of torque in image space, we propose a new contour-related feature to detect and describe local contour information in images. There are two components for our proposed feature: One is a contour patch detector for detecting image patches with interesting information of object contour, which we call the Maximal/Minimal Torque Patch (MTP) detector. The other is a contour patch descriptor for characterizing a contour patch by sampling the torque values, which we call the Multi-scale Torque (MST) descriptor. Experiments for object recognition on the Caltech-101 dataset showed that the proposed contour feature outperforms other contour-related features and is on a par with many other types of features. When combing our descriptor with the complementary SIFT descriptor, impressive recognition results are observed.

A Gestaltist approach to contour-based object recognition: Combining bottom-up and top-down cues

Ching L Teo, Cornelia Fermüller, Yiannis Aloimonos
Advances in Computational Intelligence, 309-321, Springer International Publishing, 2015. 
The International Journal of Robotics Research

The Torque is used first in a bottom-up way to detect possible objects. Then task-driven, high-level processes modulate the Torque to recognize specific objects.

Paper Abstract Project page
This paper proposes a method for detecting generic classes of objects from their representative contours that can be used by a robot with vision to find objects in cluttered environments. The approach uses a mid-level image operator to group edges into contours which likely correspond to object boundaries. This mid-level operator is used in two ways, bottom-up on simple edges and top-down incorporating object shape information, thus acting as the intermediary between low-level and high-level information. First, the mid-level operator, called the image torque, is applied to simple edges to extract likely fixation locations of objects. Using the operator’s output, a novel contour-based descriptor is created that extends the shape context descriptor to include boundary ownership information and accounts for rotation. This descriptor is then used in a multi-scale matching approach to modulate the torque operator towards the target, so it indicates its location and size. Unlike other approaches that use edges directly to guide the independent edge grouping and matching processes for recognition, both of these steps are effectively combined using the proposed method. We evaluate the performance of our approach using four diverse datasets containing a variety of object categories in clutter, occlusion and viewpoint changes. Compared with current state-of-the-art approaches, our approach is able to detect the target with fewer false alarms in most object categories. The performance is further improved when we exploit depth information available from the Kinect RGB-Depth sensor by imposing depth consistency when applying the image torque.

Fast 2d border ownership assignment

Ching Teo, Cornelia Fermüller, Yiannis Aloimonos.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 

Local and global features, inspired from psychological studies, are used to learn contour detection and assignment of neighboring foreground and background.

Paper Abstract Project page
A method for efficient border ownership assignment in 2D images is proposed. Leveraging on recent advances using Structured Random Forests (SRF) for boundary detection, we impose a novel border ownership structure that detects both boundaries and border ownership at the same time. Key to this work are features that predict ownership cues from 2D images. To this end, we use several different local cues: shape, spectral properties of boundary patches, and semi-global grouping cues that are indicative of perceived depth. For shape, we use HoG-like descriptors that encode local curvature (convexity and concavity). For spectral properties, such as extremal edges, we first learn an orthonormal basis spanned by the top K eigenvectors via PCA over common types of contour tokens. For grouping, we introduce a novel mid-level descriptor that captures patterns near edges and indicates ownership information of the boundary. Experimental results over a subset of the Berkeley Segmentation Dataset (BSDS) and the NYU Depth V2 dataset show that our method’s performance exceeds current state-of-the-art multi-stage approaches that use more complex features.

Detection and segmentation of 2D curved reflection symmetric structures

Ching Teo, Cornelia Fermüller, Yiannis Aloimonos.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 

Curved reflectional symmetries of objects are detected and then used to segment these objects.

Paper Abstract Project page
Symmetry, as one of the key components of Gestalt theory, provides an important mid-level cue that serves as input to higher visual processes such as segmentation. In this work, we propose a complete approach that links the detection of curved reflection symmetries to produce symmetryconstrained segments of structures/regions in real images with clutter. For curved reflection symmetry detection, we leverage on patch-based symmetric features to train a Structured Random Forest classifier that detects multiscaled curved symmetries in 2D images. Next, using these curved symmetries, we modulate a novel symmetryconstrained foreground-background segmentation by their symmetry scores so that we enforce global symmetrical consistency in the final segmentation. This is achieved by imposing a pairwise symmetry prior that encourages symmetric pixels to have the same labels over a MRF-based representation of the input image edges, and the final segmentation is obtained via graph-cuts. Experimental results over four publicly available datasets containing annotated symmetric structures: 1) SYMMAX-300, 2) BSD-Parts, 3) Weizmann Horse (both from) and 4) NY-roads demonstrate the approach’s applicability to different environments with state-of-the-art performance.

Shadow-Free Segmentation in Still Images Using Local Density Measure

Aleksandrs. Ecins, Cornelia Fermüller, Yiannis Aloimonos. .
International Conference on Computational Photography (ICCP), 2014 . 

A figure-ground segmentation algorithm, not affected by shaow is proposed, that specifically has been designed for textured regions

Paper Abstract Project page
Over the last decades several approaches were introduced to deal with cast shadows in background subtraction applications. However, very few algorithms exist that address the same problem for still images. In this paper we propose a figure ground segmentation algorithm to segment objects in still images affected by shadows. Instead of modeling the shadow directly in the segmentation process our approach works actively by first segmenting an object and then testing the resulting boundary for the presence of shadows and resegmenting again with modified segmentation parameters. In order to get better shadow boundary detection results we introduce a novel image preprocessing technique based on the notion of the image density map. This map improves the illumination invariance of classical filterbank based texture description methods. We demonstrate that this texture feature improves shadow detection results. The resulting segmentation algorithm achieves good results on a new figure ground segmentation dataset with challenging illumination conditions.

3D Point Clouds

Affordance Detection of Tool Parts from Geometric Features

Austin Myers, Ching L. Teo, Cornelia Fermüller, Yiannis Aloimonos.
IEEE International Conference on Robotics and Automation, ICRA. 2015.

An essential cue for object affordances is shape. We created a 3D database of household tools with the affordances of their parts annotated, and provided methods to learn patch-based affordances from shape. 

Paper Abstract Project page
As robots begin to collaborate with humans in everyday workspaces, they will need to understand the functions of tools and their parts. To cut an apple or hammer a nail, robots need to not just know the tool’s name, but they must localize the tool’s parts and identify their functions. Intuitively, the geometry of a part is closely related to its possible functions, or its affordances. Therefore, we propose two approaches for learning affordances from local shape and geometry primitives: 1) superpixel based hierarchical matching pursuit (S-HMP); and 2) structured random forests (SRF). Moreover, since a part can be used in many ways, we introduce a large RGB-Depth dataset where tool parts are labeled with multiple affordances and their relative rankings. With ranked affordances, we evaluate the proposed methods on 3 cluttered scenes and over 105 kitchen, workshop and garden tools, using ranked correlation and a weighted F-measure score [26]. Experimental results over sequences containing clutter, occlusions, and viewpoint changes show that the approaches return precise predictions that could be used by a robot. S-HMP achieves high accuracy but at a significant computational cost, while SRF provides slightly less accurate predictions but in real-time. Finally, we validate the effectiveness of our approaches on the Cornell Grasping Dataset for detecting graspable regions, and achieve state-of-the-art performance.

Cluttered Scene segmentation using the symmetry constraint

Aleksandrs Ecins, Cornelia Fermüller, Yiannis Aloimonos.
IEEE Int'l Conference on Robotics and Automation, 2015. 

Relectional symmetry detected from pointcloud data from contour information, are used for segmenting 3D objects.

Paper Abstract Project page
Although modern object segmentation algorithms can deal with isolated objects in simple scenes, segmenting nonconvex objects in cluttered environments remains a challenging task. We introduce a novel approach for segmenting unknown objects in partial 3D pointclouds that utilizes the powerful concept of symmetry. First, 3D bilateral symmetries in the scene are detected efficiently by extracting and matching surface normal edge curves in the pointcloud. Symmetry hypotheses are then used to initialize a segmentation process that finds points of the scene that are consistent with each of the detected symmetries. We evaluate our approach on a dataset of 3D pointcloud scans of tabletop scenes. We demonstrate that the use of the symmetry constraint enables our approach to correctly segment objects in challenging configurations and to outperform current state-of-the-art approaches.

Seeing Behind The Scene: Using Symmetry To Reason About Objects in Cluttered Environments

A. Ecins, C. Fermüller, Y. Aloimonos.
International Conference on Intelligent Robots (IROS), Oct 2018

Rotational and reflectional symmetries are detected by fitting symmetry axes/planes to smooth surfaces extracted from pointclouds, and they are then used for object segmentation.

Paper Abstract Project page
Symmetry is a common property shared by the majority of man-made objects. This paper presents a novel bottom-up approach for segmenting symmetric objects and recovering their symmetries from 3D pointclouds of natural scenes. Candidate rotational and reflectional symmetries are detected by fitting symmetry axes/planes to the geometry of the smooth surfaces extracted from the scene. Individual symmetries are used as constraints for the foreground segmentation problem that uses symmetry as a global grouping principle. Evaluation on a challenging dataset shows that our approach can reliably segment objects and extract their symmetries from incomplete 3D reconstructions of highly cluttered scenes, outperforming state-of-the-art methods by a wide margin.

cilantro: a lean, versatile, and efficient library for point cloud data processing

Konstantinos Zampogiannis, Cornelia Fermüller, Yiannis Aloimonos.
ACM Multimedia 2018 Open Source Software Competition, October 2018. 

The library provides functionality that covers low-level point cloud operations, spatial reasoning, various methods for point cloud segmentation and generic data clustering, flexible algorithms for robust or local geometric alignment, model fitting, as well as powerful visualization tools.

Paper Abstract Code
We introduce cilantro, an open-source C++ library for geometric and general-purpose point cloud data processing. The library provides functionality that covers low-level point cloud operations, spatial reasoning, various methods for point cloud segmentation and generic data clustering, flexible algorithms for robust or local geometric alignment, model fitting, as well as powerful visualization tools. To accommodate all kinds of workflows, cilantro is almost fully templated, and most of its generic algorithms operate in arbitrary data dimension. At the same time, the library is easy to use and highly expressive, promoting a clean and concise coding style. cilantro is highly optimized, has a minimal set of external dependencies, and supports rapid development of performant point cloud processing software in a wide variety of contexts.

Topology-Aware Non-Rigid Point Cloud Registration

Konstantinos Zampogiannis, Cornelia Fermüller, Yiannis Aloimonos.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019 

The paper introduces a non-rigid point cloud registration pipeline, with special attention to topological changes.

Paper Abstract Project page
In this paper, we introduce a non-rigid registration pipeline for pairs of unorganized point clouds that may be topologically different. Standard warp field estimation algorithms, even under robust, discontinuity-preserving regularization, tend to produce erratic motion estimates on boundaries associated with ‘close-to-open’ topology changes. We overcome this limitation by exploiting backward motion: in the opposite motion direction, a ‘close-to-open’ event becomes ‘open-to-close’, which is by default handled correctly. At the core of our approach lies a general, topology-agnostic warp field estimation algorithm, similar to those employed in recently introduced dynamic reconstruction systems from RGB-D input. We improve motion estimation on boundaries associated with topology changes in an efficient post-processing phase. Based on both forward and (inverted) backward warp hypotheses, we explicitly detect regions of the deformed geometry that undergo topological changes by means of local deformation criteria and broadly classify them as ‘contacts’ or ‘separations’. Subsequently, the two motion hypotheses are seamlessly blended on a local basis, according to the type and proximity of detected events. Our method achieves state-of-the-art motion estimation accuracy on the MPI Sintel dataset. Experiments on a custom dataset with topological event annotations demonstrate the effectiveness of our pipeline in estimating motion on event boundaries, as well as promising performance in explicit topological event detection.