Event-based Vision

Event-based vision sensors, such as the DVS, inspired in their design by biological vision, record data in very compact form at high temporal resolution, with low latency, and high dynamic range, and these properties make then ideally suited for real-time motion analysis. So far we have focused on the fundamental capabilities of visual navigation: estimation of image motion, 3D motion, and object segmentation, and studied how spatially global and local computations interact to solve these tasks. 

Learning sensorimotor control with neuromorphic sensors: Toward hyperdimensional active perception

Anton Mitrokhin,   Peter Sutor,   Cornelia Fermüller,   Yiannis Aloimonos
Science Robotics 4 (30) 2019. 

Paper Abstract Project page
The hallmark of modern robotics is the ability to directly fuse the platform's perception with its motoric ability - the concept often referred to as active perception. Nevertheless, we find that action and perception are often kept in separated spaces, which is a consequence of traditional vision being frame based and only existing in the moment and motion being a continuous entity. This bridge is crossed by the dynamic vision sensor (DVS), a neuromorphic camera that can see the motion. We propose a method of encoding actions and perceptions together into a single space that is meaningful, semantically informed, and consistent by using hyperdimensional binary vectors (HBVs). We used DVS for visual perception and showed that the visual component can be bound with the system velocity to enable dynamic world perception, which creates an opportunity for real-time navigation and obstacle avoidance. Actions performed by an agent are directly bound to the perceptions experienced to form its own memory. Furthermore, because HBVs can encode entire histories of actions and perceptions - from atomic to arbitrary sequences - as constant-sized vectors, autoassociative memory was combined with deep learning paradigms for controls. We demonstrate these properties on a quadcopter drone ego-motion inference task and the MVSEC (multivehicle stereo event camera) dataset.

EVDodge: Embodied AI for High-Speed Dodging on a quadrotor using event cameras

Nitin Sanket,   Chethan Parameshwara,  Chahat Deep Singh,   Cornelia Fermüller,   Davide Scaramuzza   Yiannis Aloimonos
arXiv 2019. 

Paper Abstract Project page
The human fascination to understand ultra-efficient agile flying beings like birds and bees have propelled decades of research on trying to solve the problem of obstacle avoidance on micro aerial robots. However, most of the prior research has focused on static obstacle avoidance. This is due to the lack of high-speed visual sensors and scalable visual algorithms. The last decade has seen an exponential growth of neuromorphic sensors which are inspired by nature and have the potential to be the de facto standard for visual motion estimation problems.

After re-imagining the navigation stack of a micro air vehicle as a series of hierarchical competences, we develop a purposive artificial intelligence based formulation for the problem of general navigation. We call this AI framework "Embodied AI" - AI design based on the knowledge of agent's hardware limitations and timing/computation constraints. Following this design philosophy we develop a complete AI navigation stack for dodging multiple dynamic obstacles on a quadrotor with a monocular event camera and computation. We also present an approach to directly transfer the shallow neural networks trained in simulation to the real world by subsuming pre-processing using a neural network into the pipeline.

We successfully evaluate and demonstrate the proposed approach in many real-world experiments with obstacles of different shapes and sizes, achieving an overall success rate of 70% including objects of unknown shape and a low light testing scenario. To our knowledge, this is the first deep learning based solution to the problem of dynamic obstacle avoidance using event cameras on a quadrotor. Finally, we also extend our work to the pursuit task by merely reversing the control policy proving that our navigation stack can cater to different scenarios.

EV-IMO: Motion Segmentation Dataset and Learning Pipeline for Event Cameras

Anton Mitrokhin,   ChengXi Ye,   Cornelia Fermüller,   Yiannis Aloimonos   Tobi Delbruck
arXiv 2019. 

Paper Abstract Project page
We present the first event-based learning approach for motion segmentation in indoor scenes and the first event-based dataset EV-IMO which includes accurate pixel-wise motion masks, egomotion and ground truth depth. Our approach is based on an efficient implementation of the SfM learning pipeline using a low parameter neural network architecture on event data. In addition to camera egomotion and a dense depth map, the network estimates pixel-wise independently moving object segmentation and computes per-object 3D translational velocities for moving objects. Additionally, we train a shallow network with just 40k parameters, which is able to compute depth and egomotion.

Our EV-IMO dataset features 32 minutes of indoor recording with 1 to 3 fast moving objects simultaneously on the camera frame. The objects and the camera are tracked by the VICON motion capture system. We use 3D scans of the room and objects to obtain accurate depth map ground truth and pixel-wise object masks, which are reliable even in poor lighting conditions and during fast motion. We then train and evaluate our learning pipeline on EV-IMO and demonstrate that our approach far surpasses its rivals and is well suited for scene constrained robotics applications.

Unsupervised Learning of Dense Optical Flow and Depth from Sparse Event Data

ChengXi Ye,   Anton Mitrokhin,   Cornelia Fermüller,  James A. Yorke and Yiannis Aloimonos
arXiv 2019. 

Paper Abstract Project page
In this work we present a lightweight, unsupervised learning pipeline for dense depth, optical flow and egomotion estimation from sparse event output of the Dynamic Vision Sensor (DVS). To tackle this low level vision task, we use a novel encoder-decoder neural network architecture - ECN.

Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only. The network works in self-supervised mode and has just 150k parameters. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation. Due to the lightweight design, the inference part of the network runs at 250 FPS on a single GPU, making the pipeline ready for realtime robotics applications. Our experiments demonstrate significant improvements upon previous works that used deep learning on event data, as well as the ability of our pipeline to perform well during both day and night.

Event-based Moving Object Detection and Tracking

Anton Mitrokhin,  Cornelia Fermüller,  Chethan M Parameshwara and Yiannis Aloimonos
IEEE International Conference on Intelligent Robots (IROS) 2018. 

Paper Abstract Project page
Event-based vision sensors, such as the Dynamic Vision Sensor (DVS), are ideally suited for real-time motion analysis. The unique properties encompassed in the readings of such sensors provide high temporal resolution, superior sensitivity to light and low latency. These properties provide the grounds to estimate motion extremely reliably in the most sophisticated scenarios but they come at a price - modern eventbased vision sensors have extremely low resolution and produce a lot of noise. Moreover, the asynchronous nature of the event stream calls for novel algorithms. This paper presents a new, efficient approach to object tracking with asynchronous cameras. We present a novel event stream representation which enables us to utilize information about the dynamic (temporal) component of the event stream, and not only the spatial component, at every moment of time. This is done by approximating the 3D geometry of the event stream with a parametric model; as a result, the algorithm is capable of producing the motion-compensated event stream (effectively approximating egomotion), and without using any form of external sensors in extremely low-light and noisy conditions without any form of feature tracking or explicit optical flow computation. We demonstrate our framework on the task of independent motion detection and tracking, where we use the temporal model inconsistencies to locate differently moving objects in challenging situations of very fast motion.

Real-time clustering and multi-target tracking using event-based sensors

Francisco Barranco,  Cornelia Fermüller,  and Eduardo Ros
IEEE International Conference on Intelligent Robots (IROS) 2018. 

Paper Abstract
Clustering is crucial for many computer vision applications such as robust tracking, object detection and segmentation. This work presents a real-time clustering technique that takes advantage of the unique properties of eventbased vision sensors. Since event-based sensors trigger events only when the intensity changes, the data is sparse, with low redundancy. Thus, our approach redefines the well-known mean-shift clustering method using asynchronous events instead of conventional frames. The potential of our approach is demonstrated in a multi-target tracking application using Kalman filters to smooth the trajectories. We evaluated our method on an existing dataset with patterns of different shapes and speeds, and a new dataset that we collected. The sensor was attached to the Baxter robot in an eye-in-hand setup monitoring real-world objects in an action manipulation task. Clustering accuracy achieved an F-measure of 0.95, reducing the computational cost by 88% compared to the frame-based method. The average error for tracking was 2.5 pixels and the clustering achieved a consistent number of clusters along time.

A dataset for visual navigation with neuromorphic methods

Francisco Barranco, Cornelia Fermüller, Yiannis Aloimonos, and Tobi Delbruck
Frontiers in Neuroscience, 10, 49, 2018. 

Paper Abstract Project page
Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

Contour Detection and Characterization for Asynchronous Event Sensors

Francisco Barranco, Ching L. Teo, Cornelia Fermüller, Yiannis Aloimonos
IEEE International Conference on Computer Vision (ICCV), 2015. 

Paper Abstract Project page
The bio-inspired, asynchronous event-based dynamic vision sensor records temporal changes in the luminance of the scene at high temporal resolution. Since events are only triggered at significant luminance changes, most events occur at the boundary of objects and their parts. The detection of these contours is an essential step for further interpretation of the scene. This paper presents an approach to learn the location of contours and their border ownership using Structured Random Forests on event-based features that encode motion, timing, texture, and spatial orientations. The classifier integrates elegantly information over time by utilizing the classification results previously computed. Finally, the contour detection and boundary assignment are demonstrated in a layer-segmentation of the scene. Experimental results demonstrate good performance in boundary detection and segmentation.

Bio-inspired Motion Estimation with Event-Driven Sensors

Francisco Barranco, Cornelia Fermüller, and Yiannis Aloimonos
Advances in Computational Intelligence, 309-321, Springer International Publishing, 2015. 

Paper Abstract
This paper presents a method for image motion estimation for event-based sensors. Accurate and fast image flow estimation still challenges Computer Vision. A new paradigm based on asynchronous event-based data provides an interesting alternative and has shown to provide good estimation at high contrast contours by estimating motion based on very accurate timing. However, these techniques still fail in regions of high-frequency texture. This work presents a simple method for locating those regions, and a novel phase-based method for event sensors that estimates more accurately these regions. Finally, we evaluate and compare our results with other state-of-the-art techniques.

Contour Motion Estimation for Asynchronous Event-Driven Cameras

Francisco Barranco, Cornelia Fermüller, Yiannis Aloimonos
Proceedings of the IEEE, 102, 10, 1537-1556 

Paper Abstract
This paper compares image motion estimation with asynchronous event-based cameras to Computer Vision approaches using as input frame-based video sequences. Since dynamic events are triggered at significant intensity changes, which often are at the border of objects, we refer to the eventbased image motion as ‘‘contour motion.’’ Algorithms are presented for the estimation of accurate contour motion from local spatio–temporal information for two camera models: the dynamic vision sensor (DVS), which asynchronously records temporal changes of the luminance, and a family of new sensors which combine DVS data with intensity signals. These algorithms take advantage of the high temporal resolution of the DVS and achieve robustness using a multiresolution scheme in time. It is shown that, because of the coupling of velocity and luminance information in the event distribution, the image motion estimation problem becomes much easier with the new sensors which provide both events and image intensity than with the DVS alone. Experiments on synthesized data from computer vision benchmarks show that our algorithm on combined data outperforms computer vision methods in accuracy and can achieve real-time performance, and experiments on real data confirm the feasibility of the approach. Given that current image motion (or so-called optic flow) methods cannot estimate well at object boundaries, the approach presented here could be used complementary to optic flow techniques, and can provide new avenues for computer vision motion research.