EROAM: Event-based Camera Rotational Odometry and Mapping in Real-time
- Wanli Xing1,2
- Shijie Lin1,2
- Linhan Yang1,2
- Zeqing Zhang1,2
- Yanjun Du3
- Maolin Lei4
- Yipeng Pan1,2
- Chen Wang1,2
- Jia Pan1,2†
- 1Department of Computer Science, The University of Hong Kong
- 2Centre for Transformative Garment Production
- 3The Chinese University of Hong Kong
- 4Istituto Italiano Di Tecnologia (IIT)
- †Corresponding author
Abstract
This paper presents EROAM, a novel event-based rotational odometry and mapping system that achieves real-time, accurate camera rotation estimation. Unlike existing approaches that rely on event generation models or contrast maximization, EROAM employs a spherical event representation by projecting events onto a unit sphere and introduces Event Spherical Iterative Closest Point (ES-ICP), a novel geometric optimization framework designed specifically for event camera data. The spherical representation simplifies rotational motion formulation while operating in a continuous spherical domain, enabling enhanced spatial resolution. Our system features an efficient map management approach using incremental k-d tree structures and intelligent regional density control, ensuring optimal computational performance during long-term operation. Combined with parallel point-to-line optimization, EROAM achieves efficient computation without compromising accuracy. Extensive experiments on both synthetic and real-world datasets show that EROAM significantly outperforms state-of-the-art methods in terms of accuracy, robustness, and computational efficiency. Our method maintains consistent performance under challenging conditions, including high angular velocities and extended sequences, where other methods often fail or show significant drift. Additionally, EROAM produces high-quality panoramic reconstructions with preserved fine structural details.
Real-time Demonstration of EROAM
Problem Definition
Event cameras represent a revolutionary advancement in visual sensing technology. Unlike conventional cameras that capture images at fixed intervals, event cameras operate asynchronously, detecting and reporting per-pixel brightness changes with microsecond precision. Each event is encoded as a tuple ek = (xk, tk, pk), where xk = (uk, vk)T represents the pixel location, tk the timestamp, and pk the polarity of the brightness change. Accurate rotational motion estimation remains a fundamental challenge in computer vision and robotics applications. Traditional frame-based cameras struggle with rapid rotations due to inherent limitations such as motion blur, limited dynamic range, and large inter-frame displacements. Event cameras, with their high temporal resolution (>1MHz), high dynamic range (>140dB), and freedom from motion blur, offer unique advantages in capturing and tracking rapid rotational movements with unprecedented precision.
Event camera rotation
Event camera sequence
The challenge of rotational motion estimation is fundamental in computer vision and robotics. While traditional frame-based cameras struggle with rapid rotations due to motion blur and limited exposure control, event cameras offer unique advantages through their high temporal resolution and freedom from motion blur, enabling potential accurate capture of rapid rotational movements.
Methodology
Overview
EROAM introduces a novel approach to event-based rotational motion estimation by combining spherical event representation with an efficient ICP-based optimization framework. Our system processes incoming event streams through two main components: a tracking module that estimates camera rotation in real-time using Event Spherical ICP (ES-ICP), and a mapping module that maintains a continuous spherical event map while enabling high-quality panoramic reconstruction.
Odometry and Mapping Overview
Event Spherical Representation
We propose a novel spherical representation of events that offers two significant advantages: (1) Simplification of rotational motion formula, and (2) Enhanced spatial resolution through continuous mapping.
Spherical projection process showing the transformation from pixel coordinates to unit sphere
Event Spherical ICP (ES-ICP)
The core of our approach lies in the Event Spherical Iterative Closest Point (ES-ICP) algorithm. Unlike existing approaches that rely on event generation models or contrast maximization, our method efficiently matches and aligns event projections in continuous spherical space.
ES-ICP optimization process visualization showing point-to-line error minimization on the unit sphere
Event Spherical Map Maintenance
To maintain an accurate and efficient representation of the environment while ensuring predictable computational costs for real-time operation, EROAM employs an intelligent map maintenance approach that combines incremental updates with regional density control. We selectively update the spherical event map based on significant camera motion, using an incremental k-d tree (ikd-Tree) data structure that supports efficient insertions and deletions without rebuilding the entire tree. This ensures that map update operations maintain bounded and predictable execution times regardless of map size.
A significant challenge in event-based mapping arises when the camera repeatedly observes the same area, generating redundant points. Our Regional Density Management (RDM) approach addresses this by partitioning the unit sphere into grid cells based on latitude-longitude lines, each with a maximum point capacity proportionally scaled according to surface area. When a new keyframe is added, events are only inserted into regions that haven't reached their capacity limits, maintaining uniform spatial density and preventing over-representation of frequently observed regions.
Illustration of Regional Density Management (RDM). The unit sphere is divided into grid cells based on latitude-longitude lines. Pink cells have reached their capacity, so new points falling within these saturated regions are not added to the map.
Comparison of computational time distribution across EROAM components on the LT-80 sequence with and without Regional Density Management (RDM). The pie charts demonstrate how RDM optimizes processing efficiency by significantly reducing the ikd-Tree operation overhead.
Panorama generation
A key advantage of our spherical event representation is its ability to generate high-quality panoramic images at arbitrary resolutions. Unlike traditional methods that operate in discretized pixel space, our continuous spherical mapping approach maintains complete independence from resolution constraints. The panorama generation process involves projecting events from the unit sphere onto a surrounding cylinder, which is then unwrapped to form a detailed 2D panoramic representation. This approach preserves fine structural details while ensuring consistent quality across the entire field of view.
Projection from spherical event map to panoramic image
Experiments
Evaluation on ECRot Dataset
We conduct comprehensive experiments on both synthetic and real-world datasets. On the ECRot dataset, EROAM significantly outperforms state-of-the-art methods in terms of accuracy and robustness.
Qualitative comparison of panoramic mapping results on the town sequence from ECRot dataset. (a,c,d) show panoramas generated using our unified pipeline (7617 × 2000 resolution) with different methods’ estimated trajectories, where identical event window size (0.2 ms) and colorization scheme are applied. (b) shows the original panorama output from CMax-SLAM’s implementation for reference. The zoomed-in regions highlight that our method produces sharper edges and clearer structural details compared to other approaches.
Rotation estimation results on the town sequence from ECRot dataset. The top row shows the full trajectories of roll, pitch, and yaw angles, while the bottom row shows the zoomed-in views from 3.00 s to 3.10 s. Our method (EROAM) achieves comparable accuracy with the ground truth, demonstrating robust rotation estimation performance. The trajectories of SMT and RTPT are truncated after significant deviation from ground truth for better visualization clarity.
Extended Simulation Analysis
To evaluate the robustness of our method, we conducted extended simulation experiments under more challenging conditions, including high angular velocities (up to 393.21°/s) and long sequences (up to 80s).
Quantitative evaluation results. (a) Shows the angular velocity (49.83 °/s to 393.21 °/s) and acceleration (70.41 °/s2 to 1121.32 °/s2) characteristics of DM-1 to DM-8 sequences (each 5 s long). (b) Compares the ape and rpe metrics across DM-1 to DM-8 sequences with increasing angular velocities. (c) Compares the ape and rpe metrics for LD-10 to LD-80 sequences of increasing durations (10 s to 80 s).
Qualitative comparison of panoramic mapping results on the DM-4 sequence with high angular velocity. CMax-SLAM's result exhibits significant ghosting artifacts and double edges in structures, indicating severe rotation estimation errors under high-dynamic motion. These artifacts are particularly visible in the zoomed-in regions, where single structures appear multiple times due to inconsistent motion estimates. In contrast, EROAM produces a panorama with sharp details and precisely aligned structures, demonstrating accurate rotation estimation throughout the sequence.
Comparison of rotational state estimation on the LD-80 sequence. The top row shows the complete 80 s trajectories for roll, pitch, and yaw angles, where CMax-SLAM exhibits significant consistency issues and deviates from ground truth starting from 40 s. The bottom row presents detailed views between 10 s and 15 s, revealing that CMax-SLAM's estimation already shows noticeable errors even in this early stage. In contrast, EROAM maintains consistent accuracy throughout the entire sequence, closely aligning with ground truth in both global and local perspectives.
Real-world Results
We validate our method's performance in real-world scenarios using our EROAM-campus dataset, collected with an iniVation DVXplorer event camera and Livox Avia LiDAR for ground truth.
Qualitative comparison of panoramic mapping results on real-world sequences. The zoomed-in regions in (a,b) highlight the superior structural clarity achieved by our method compared to CMax-SLAM’s results with ghosting artifacts. The full 360° panoramas in (c,d) demonstrate EROAM’s ability to maintain consistent reconstruction quality across extended views.
Additional visualization results from EROAM on different sequences. Each panorama demonstrates high-quality reconstruction with precise structure alignment and sharp details, as highlighted in the zoomed-in regions. These results showcase EROAM’s consistent performance across various challenging scenarios with different motion patterns and scene structures.
Comparison of rotational state estimation on the window-building sequence. The top row shows the complete trajectories for roll, pitch, and yaw angles, while the bottom row presents detailed views between 15 s and 20 s. Our method (EROAM) maintains accurate tracking throughout the sequence, closely following the ground truth trajectory, while CM-GAE and CMax-SLAM show noticeable deviations, particularly evident in the zoomed-in views.
Runtime Analysis
EROAM demonstrates superior computational efficiency across both synthetic and real-world datasets. Our method achieves real-time performance while requiring only CPU resources, processing most sequences at their natural speed. This efficiency stems from two key design choices: our spherical map representation that simplifies map maintenance, and our ES-ICP algorithm that enables parallel processing of point-to-line distances. Comparative analysis shows that EROAM provides significant speedups compared to existing methods - up to 17.6× faster than CMax-SLAM and 6.3× faster than CM-GAE on challenging sequences.
Robustness to Unmodeled Translational Motion
While EROAM is designed for pure rotational motion, some degree of translational motion is inevitable in practice. To evaluate robustness when this assumption is violated, we assess EROAM on the Event Camera Dataset (ECD), which features unconstrained handheld 6-DOF motion with translation typically within ~0.3 m. We process four sequences (boxes, dynamic, poster, shapes), using the first 30 seconds of each.
The rightmost column below shows panoramas generated using only the rotational component of the 6-DOF ground truth. These are severely blurred and fail to register scene edges, confirming that when parallax from translation is present, even the ground truth rotation cannot properly align events. Both EROAM and CMax-SLAM produce coherent panoramic reconstructions despite the presence of unmodeled translation.
Robustness evaluation on ECD dataset with unmodeled translation. Left: EROAM. Middle: CMax-SLAM. Right: GT rotation-only panorama. The severe blurring in GT panoramas confirms that parallax from translation makes quantitative rotation metrics misleading. Both methods produce coherent results despite minor translation.
Limitations
While EROAM demonstrates robustness to moderate translational motion as shown above, its performance degrades when translation becomes substantially larger. This limitation stems from the fundamental pure rotational motion assumption underlying our geometric formulation.
To illustrate this boundary, we evaluated EROAM on the env2_backflip1 sequence from the CEAR dataset, which features a Mini-Cheetah quadruped robot performing a backflip maneuver. This scenario combines three challenging factors: (1) large translation (0.7 m), (2) simultaneous 360° rotation, and (3) extreme close-range scenes (0.3-0.5 m) when facing the floor. Under these conditions, parallax effects become severe and the motion cannot be approximated as pure rotation.
(Left) Mini-Cheetah quadruped robot performing a backflip from the CEAR dataset. (Right) Panoramic image reconstructed from the backflip sequence. The estimated trajectory diverges from actual motion, causing misalignment manifesting as blurring and shadows in the panorama's central region.
This failure case, combined with the successful ECD results, empirically defines EROAM's operational spectrum: the method maintains functionality with moderate translation (up to ~0.3 m as in ECD sequences) but degrades when translation is substantially large and combined with close-range parallax. For applications requiring 6-DOF estimation in such scenarios, full SLAM systems would be more appropriate.
Citation
Open Source
The code and datasets are available at https://github.com/wlxing1901/eroam.
The website template was borrowed from Jon Barron.