Ground Truth Format
The ground truth is available in the following formats:
NPZ
- A collection of.npy
files, supported by NumPyEVIMO2v2 - Each sequence is a collection of
.npz
and.npy
filesEVIMO, EVIMO2v1 - Each sequence is compressed into a single
.npz
file
TXT
- Ground truth and recordings in the form of.png
and.txt
files
The high level differences between the formats are described here.
NPZ (EVIMO2v2)
There is one folder per sequence. A sequences folder can be found in the following path:
<camera>/<category>/<subcategory>/<sequence name>
Inside a sequences folder are the following files:
File |
Description |
---|---|
|
Dictionary of (RES_Y, RES_X) arrays with keys |
|
Dictionary of (RES_Y, RES_X) arrays with keys |
|
Dictionary of (RES_Y, RES_X) arrays with keys |
|
Array of *(NUM_EVENTS, 1) containing events polarity |
|
Array of *(NUM_EVENTS, 1) containing events time |
|
Array of *(NUM_EVENTS, 2) containg events pixel location |
|
Dictionary of arrays |
Ground truth depth and masks are not evenly spaced in time because the Vicon system sometimes loses track due to occlusion. The meta
field requires all the required timestamping information to handle the irregular sampling.
In EVIMO2v2 the classical camera will have a different number of depth/mask frames and classical frames because classical frames are kept even if the depth and mask are unavailable.
NPZ (EVIMO and EVIMO2v1)
There is one compressed .npz
file per sequence. A sequences file can be found in the following path: <camera>/<category>/<subcategory>/<sequence name>.npz
A sequences .npz
file contains the following .npy
files:
File |
Description |
---|---|
|
Array with shape (NUM_FRAMES, RES_Y, RES_X) |
|
Array of shape (NUM_FRAMES, RES_Y, RES_X) |
|
Array of shape (NUM_FRAMES, RES_Y, RES_X) |
|
Array of shape (NUM_EVENTS, 4) |
|
Python dictionary containing intrinsics, timestamps, poses, and IMU data |
|
Intrinsic and distortion parameters, also available in |
|
A helper lookup table for fast timestamp-to-event index computation |
|
The time between events corresponding to the indices in |
Ground truth depth and masks are not evenly spaced in time because the Vicon system sometimes loses track due to occlusion. The meta
field requires all the required timestamping information to handle the irregular sampling.
In EVIMO and EVIMO2v1 the classical camera frames are only available when depth and mask frames are available.
TXT
There is one folder per sequence. A sequences folder can be found in the following path:
<camera>/<category>/<subcategory>/<sequence name>
We strongly suggest using the pydvs sample as a prototype for manipulating the TXT data. This is the script which converts the TXT format (the output of the C++ pipeline to NPZ. It also generates the trajectory plot distributed with the TXT format, event slices, and frames for visualization videos.
Item |
Description |
---|---|
|
Conventional frames from a classical camera, when available |
|
16-bit png’s with depth and masks in different channels |
|
File with events (if available) |
|
Camera parameters see here for details |
|
String containing intrinsics, timestamps, poses, and IMU data |
|
A plot of camera/object trajectories for visualization only |
Ground truth depth and masks are not evenly spaced in time because the Vicon system sometimes loses track due to occlusion. The meta
field requires all the required timestamping information to handle the irregular sampling.
In EVIMO and EVIMO2v1 the classical camera frames are only available when depth and mask frames are available.
In EVIMO2v2 the classical camera will have a different number of depth/mask frames and classical frames because classical frames are kept even if the depth and mask are unavailable.
Meta’s Contents
Key |
Description |
---|---|
|
Array of dictionaries with one entry per ground truth sampling period |
|
Array of dictionaries with one entry per Vicon pose measurement (200Hz) |
|
Dictionary of arrays of IMU samples |
|
Camera intrinsics and time offset from the corresponding ROS bag file. |
Note: The 'vel'
key is available in older versions of the dataset, it should be ignored.
An example of the contents of meta.npy
or meta.txt
is given below. In either case, the structure is a Python dictionary and the text version can be read with Python’s eval()
method as shown here.
For brevity, only a single element of the 'frames'
, 'full_trajectory'
, and IMU arrays are shown.
{'frames': [{'11': {'pos': {'q': {'w': 0.27729, 'x': 0.145438, 'y': -0.40157, 'z': 0.860639},
'rpy': {'p': -0.49274, 'r': -0.765635, 'y': 2.720059},
't': {'x': -0.005627, 'y': 0.338652, 'z': 1.391626}},
'ts': 0.031794},
'14': {'pos': {'q': {'w': 0.507373, 'x': 0.528164, 'y': 0.658874, 'z': 0.171756},
'rpy': {'p': 0.508834, 'r': 2.080551, 'y': 1.487373},
't': {'x': -0.204377, 'y': 0.395282, 'z': 1.558496}},
'ts': 0.031794},
'15': {'pos': {'q': {'w': 0.094314, 'x': -0.133898, 'y': 0.853899, 'z': 0.493997},
'rpy': {'p': 0.29774, 'r': 2.114005, 'y': -2.999389},
't': {'x': -0.016263, 'y': 0.39068, 'z': 1.34646}},
'ts': 0.031794},
'22': {'pos': {'q': {'w': 0.390027, 'x': 0.751193, 'y': -0.092049, 'z': -0.524514},
'rpy': {'p': 0.79837, 'r': 1.780868, 'y': -0.901796},
't': {'x': -0.039562, 'y': 0.17172, 'z': 1.283998}},
'ts': 0.031794},
'23': {'pos': {'q': {'w': -0.315782, 'x': 0.541274, 'y': 0.711081, 'z': 0.318855},
'rpy': {'p': -0.917802, 'r': 2.956836, 'y': 1.931814},
't': {'x': -0.128985, 'y': -0.108972, 'z': 0.90992}},
'ts': 0.031794},
'24': {'pos': {'q': {'w': -0.302069, 'x': -0.559534, 'y': 0.146147, 'z': 0.757837},
'rpy': {'p': 0.862972, 'r': 1.036446, 'y': -1.869526},
't': {'x': 0.232299, 'y': -0.171992, 'z': 0.765111}},
'ts': 0.031794},
'5': {'pos': {'q': {'w': 0.897532, 'x': -0.232753, 'y': -0.370827, 'z': -0.052436},
'rpy': {'p': -0.761582, 'r': -0.551012, 'y': 0.108671},
't': {'x': 0.24287, 'y': -0.042829, 'z': 0.686142}},
'ts': 0.031794},
'6': {'pos': {'q': {'w': 0.629238, 'x': -0.435374, 'y': -0.09361, 'z': -0.636982},
'rpy': {'p': -0.737523, 'r': -0.61769, 'y': -1.337676},
't': {'x': 0.091899, 'y': -0.008143, 'z': 1.013714}},
'ts': 0.031794},
'cam': {'pos': {'q': {'w': 1.0, 'x': 3.6e-05, 'y': 0.000342, 'z': 0.000158},
'rpy': {'p': 0.000683, 'r': 7.2e-05, 'y': 0.000316},
't': {'x': -0.000103, 'y': -0.000202, 'z': 2.9e-05}},
'ts': 0.031794},
'classical_frame': 'img_0.png',
'gt_frame': 'depth_mask_0.png',
'id': 0,
'ts': 0.031794}],
'full_trajectory': [{'11': {'pos': {'q': {'w': 0.27728, 'x': 0.145302, 'y': -0.401452, 'z': 0.86072},
'rpy': {'p': -0.492419, 'r': -0.765375, 'y': 2.719923},
't': {'x': -0.005162, 'y': 0.338606, 'z': 1.391609}},
'ts': 0.017647},
'14': {'pos': {'q': {'w': 0.507253, 'x': 0.527763, 'y': 0.659383, 'z': 0.17139},
'rpy': {'p': 0.509845, 'r': 2.081517, 'y': 1.488579},
't': {'x': -0.203892, 'y': 0.395231, 'z': 1.558574}},
'ts': 0.017647},
'15': {'pos': {'q': {'w': 0.093911, 'x': -0.133816, 'y': 0.853955, 'z': 0.493998},
'rpy': {'p': 0.296946, 'r': 2.11404, 'y': -2.999136},
't': {'x': -0.015858, 'y': 0.390641, 'z': 1.346457}},
'ts': 0.017647},
'22': {'pos': {'q': {'w': 0.390079, 'x': 0.751092, 'y': -0.092001, 'z': -0.524628},
'rpy': {'p': 0.798503, 'r': 1.780428, 'y': -0.90209},
't': {'x': -0.039156, 'y': 0.171642, 'z': 1.284023}},
'ts': 0.017647},
'23': {'pos': {'q': {'w': -0.31792, 'x': 0.540927, 'y': 0.710404, 'z': 0.318828},
'rpy': {'p': -0.921696, 'r': 2.960192, 'y': 1.930286},
't': {'x': -0.128655, 'y': -0.10911, 'z': 0.90998}},
'ts': 0.017647},
'24': {'pos': {'q': {'w': -0.302236, 'x': -0.559342, 'y': 0.146205, 'z': 0.757901},
'rpy': {'p': 0.862507, 'r': 1.036064, 'y': -1.869722},
't': {'x': 0.232548, 'y': -0.172056, 'z': 0.765072}},
'ts': 0.017647},
'5': {'pos': {'q': {'w': 0.897617, 'x': -0.232821, 'y': -0.370598, 'z': -0.052295},
'rpy': {'p': -0.761021, 'r': -0.551155, 'y': 0.108874},
't': {'x': 0.243082, 'y': -0.04289, 'z': 0.68607}},
'ts': 0.017647},
'6': {'pos': {'q': {'w': 0.62919, 'x': -0.435517, 'y': -0.093582, 'z': -0.636936},
'rpy': {'p': -0.737654, 'r': -0.618076, 'y': -1.33747},
't': {'x': 0.092225, 'y': -0.008196, 'z': 1.013705}},
'ts': 0.017647},
'cam': {'pos': {'q': {'w': 1.0, 'x': 6.1e-05, 'y': 0.000162, 'z': 0.000107},
'rpy': {'p': 0.000324, 'r': 0.000122, 'y': 0.000214},
't': {'x': -6.6e-05, 'y': -9e-05, 'z': 8e-06}},
'ts': 0.017647},
'gt_frame': 'depth_mask_18446744073709551615.png',
'id': 18446744073709551615,
'ts': 0.017647}],
'imu': {'/prophesee/left/imu': [{'angular_velocity': {'x': -0.001065, 'y': 0.011984, 'z': 0.020639},
'linear_acceleration': {'x': 8.293953, 'y': 0.349074, 'z': -5.420528},
'ts': 0.017303}],
'/prophesee/right/imu': [{'angular_velocity': {'x': -0.036885, 'y': 0.021438, 'z': -0.006924},
'linear_acceleration': {'x': -8.361612, 'y': 0.078437, 'z': -4.883445},
'ts': 0.017319}]},
'meta': {'cx': 1053.709961,
'cy': 788.531982,
'dist_model': 'radtan',
'fx': 2066.48999,
'fy': 2066.469971,
'k1': -0.117189,
'k2': 0.069793,
'k3': 0.0,
'k4': 0.0,
'p1': -0.000327,
'p2': 0.005909,
'res_x': 2080,
'res_y': 1552,
'ros_time_offset': 1606265321.445115}}
Transform Convention
Object poses represent transforms from the object frame to the camera frame.
Camera poses represent transforms from the camera frame to the world frame.
Format Comparison
EVIMO2v1 vs EVIMO2v2
Many important improvements to the data generation pipeline that transforms raw recordings to released data were made in order to release EVIMO2v2. These changes include:
All camera’s data’s timestamps are synchronized to a common source (Vicon)
Event camera ground truths are synchronized and jitter is eliminated up to numercial precision
Classical camera’s ground truth jitter is eliminated up to numerical precision
Classical camera’s image data is kept when there is no ground truth due to Vicon occlusion
A redesigned NPZ format greatly reduces loading time without requiring excessive disk usage
Events are not filtered when copied out of the
raw
recordingsSeveral sequences with no usable data were deleted
Sequences were split into several sub-sequences satisfying:
Gaps in ground truth depth/mask have a duration of at most 1 second
All sub-sequences are at least 0.4 seconds long
EVIMO2v2 TXT vs EVIMO2v2 NPZ
Decompressed, the EVIMO2v2 NPZ format requires 525 GB of space of which 271 GB is for the RGB camera and 255 GB is for the three DVS cameras.
Decompressed, the TXT format requires 842 GB of space of which 212 GB is for the RGB camera and 631 GB is for the three DVS cameras.
The EVIMO2v2 NPZ format supports memory mapping the event arrays from disk.
The TXT format requires parsing the events.txt
file, which can tens of seconds per sequence.
The TXT format decompresses to about 471,000 files. The npz format decompresses to 4,700 files.
EVIMO, EVIMO2v1 NPZ vs EVIMO2v2 NPZ
EVIMO and EVIMO2v1 compressed the entire dataset into a single .npz
file. Inside this .npz
there are archives for the depth, masks, events, and conventional images. To access any of this data the entire depth, mask, events, or conventional archives must be decompressed into RAM (by numpy) or onto disk (manually by the user). This can take several minutes for each sequence and requires the user to either have upwards of 64GB of RAM (to hold a decompressed sequence) or multiple TB of disk space (to store all sequences decompressed at once).
EVIMO2v2 introduces a new NPZ format with two major changes. These changes eliminate long sequence load times and while preventing excessive disk usage. First, depth, masks, and conventional images are compressed frame by frame so that only the frame being currently used needs to be decompressed and stored in RAM. Second, events are stored uncompressed with the minimum width data type. Because the events are stored in a decompressed format, the arrays can be memory mapped so that only the portions of the events currently in use need to be stored in RAM.