Embedding High-Level Information into Low Level Vision: Efficient Object Search in Clutter
Ching L. Teo, Austin Myers, Cornelia Fermüller, Yiannis Aloimonos
We present a novel visual search method that exploits the mid-level grouping capabilities of the image-torque  operator for the purposes of recognizing objects under clutter. The input is a RGB-Depth image and a set of target shape models that represents the object. We then tune the image-torque operator via a shape conforming distance metric so that the target object exhibits the largest torque value. The maxima (and minima) of this "object-tuned" modulated torque map can then be used as potential centroid locations of the target objects in the scene. Due to the mid-level nature of the operator, it is inherently robust to clutter. We demonstrate experimentally that the proposed method is able to detect different objects under increasing clutter on a table compared  in addition to other state of the art bottom-up saliency detectors: Itti et al. , Harel et al. (GBVS)  and Kernel-based object detectors of Bo et al. .
Some example detection results comparing original torque (middle) and modulated torque (right) for three objects: flashlight, cap and tissue-box. Notice that there are less false maxima/minima between objects in the modulated torque and the top detections are found in the correct target.
Object retrieval accuracy measured in terms of Cumulative Matching Curves (CMC). A method is better if it has a higher hit rate with less fixations. (a) and (b): CMC scores for different objects of the proposed "Top-Down" approach applied over the UMD clutter dataset and RGB-D scenes dataset of  respectively. (c) and (d): mean CMC scores comparing different approaches.
Detailed results for each object category for different approaches are available here.
 M. Nishigaki, C. Fermüller, and D. Dementhon. "The Image Torque Operator: A New Tool for Mid-level Vision". Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) , pp. 502–509, 2012
 L. Itti, C. Koch, and E. Niebur. "A model of saliency-based visual attention for rapid scene analysis". IEEE Trans. Pattern Analysis and Machine Intelligence, 20(11):1254–1259, Nov. 1998
 J. Harel, C. Koch, and P. Perona. "Graph-based visual saliency", In Advances in neural information processing systems (NIPS), pages 545–552, 2006.
 L. Bo, X. Ren, and D. Fox. "Kernel descriptors for visual recognition", In Advances in Neural Information Processing Systems (NIPS), pp. 244-252. 2010.
 University of Washington RGB-D Object dataset.
The support of the European Union under the Cognitive Systems program (project POETICON++), the National
Science Foundation under the Cyberphysical Systems Program and the Qualcomm Innovation Fellowship (Ching L. Teo) is gratefully acknowledged.
Questions? Please contact cteo "at" umd dot edu