Ground Truth Format

The ground truth is available in the following formats:

NPZ - A collection of .npy files, supported by NumPy
- EVIMO2v2 - Each sequence is a collection of .npz and .npy files
- EVIMO, EVIMO2v1 - Each sequence is compressed into a single .npz file
TXT - Ground truth and recordings in the form of .png and .txt files

The high level differences between the formats are described here.

NPZ (EVIMO2v2)

There is one folder per sequence. A sequences folder can be found in the following path:
<camera>/<category>/<subcategory>/<sequence name>

Inside a sequences folder are the following files:

File	Description
`dataset_classical.npz`	Dictionary of (RES_Y, RES_X) arrays with keys `classical_<frame id>`
`dataset_depth.npz`	Dictionary of (RES_Y, RES_X) arrays with keys `depth_<frame id>`
`dataset_mask.npz`	Dictionary of (RES_Y, RES_X) arrays with keys `mask_<frame id>` masks contains object ids multiplied by `1000`
`dataset_events_p.npy`	Array of (NUM_EVENTS, 1) containing events polarity Can be memory mapped* Samsung event polarity is inverted compared to Prophesee
`dataset_events_t.npy`	Array of (NUM_EVENTS, 1) containing events time Can be memory mapped*
`dataset_events_xy.npy`	Array of (NUM_EVENTS, 2) containg events pixel location Can be memory mapped*
`dataset_info.npz`	Dictionary of arrays `D`, `discretization`, `index`, `K`, `meta` Contents are identical to the EVIMO, EVIMO2v1 NPZ format

Ground truth depth and masks are not evenly spaced in time because the Vicon system sometimes loses track due to occlusion. The meta field requires all the required timestamping information to handle the irregular sampling.

In EVIMO2v2 the classical camera will have a different number of depth/mask frames and classical frames because classical frames are kept even if the depth and mask are unavailable.

NPZ (EVIMO and EVIMO2v1)

There is one compressed .npz file per sequence. A sequences file can be found in the following path: <camera>/<category>/<subcategory>/<sequence name>.npz

A sequences .npz file contains the following .npy files:

File	Description
`classical.npy`	Array with shape (NUM_FRAMES, RES_Y, RES_X) Contains classical frames if available
`depth.npy`	Array of shape (NUM_FRAMES, RES_Y, RES_X) depth is in mm
`mask.npy`	Array of shape (NUM_FRAMES, RES_Y, RES_X) masks contains object ids multiplied by `1000`
`events.npy`	Array of shape (NUM_EVENTS, 4) Each row contains an events timestamp, x/y pixel coordinate and polarity EVIMO2: Samsung event polarity is inverted compared to Prophesee
`meta.npy`	Python dictionary containing intrinsics, timestamps, poses, and IMU data The full description is here
`K.npy/D.npy`	Intrinsic and distortion parameters, also available in `meta.npy`
`index.npy`	A helper lookup table for fast timestamp-to-event index computation Contains indices of events every `discretization.npy` seconds
`discretization.npy`	The time between events corresponding to the indices in `index.npy`

Ground truth depth and masks are not evenly spaced in time because the Vicon system sometimes loses track due to occlusion. The meta field requires all the required timestamping information to handle the irregular sampling.

In EVIMO and EVIMO2v1 the classical camera frames are only available when depth and mask frames are available.

TXT

There is one folder per sequence. A sequences folder can be found in the following path:
<camera>/<category>/<subcategory>/<sequence name>

We strongly suggest using the pydvs sample as a prototype for manipulating the TXT data. This is the script which converts the TXT format (the output of the C++ pipeline to NPZ. It also generates the trajectory plot distributed with the TXT format, event slices, and frames for visualization videos.

Item	Description
`img/img_<frame id>.png`	Conventional frames from a classical camera, when available
`img/depth_mask_<frame id>.png`	16-bit png’s with depth and masks in different channels Depth is in mm Masks contain per-pixel object ids multiplied by `1000` An example of reading the image is available here The relevant C++ code is here
`events.txt`	File with events (if available) One event per line in a format `<timestamp px py polarity>`
`calib.txt` `extrinsics.txt` `params.txt` or `config.txt`	Camera parameters see here for details
`meta.txt`	String containing intrinsics, timestamps, poses, and IMU data The full description is here
`position_plots.pdf`	A plot of camera/object trajectories for visualization only

Ground truth depth and masks are not evenly spaced in time because the Vicon system sometimes loses track due to occlusion. The meta field requires all the required timestamping information to handle the irregular sampling.

In EVIMO and EVIMO2v1 the classical camera frames are only available when depth and mask frames are available.

In EVIMO2v2 the classical camera will have a different number of depth/mask frames and classical frames because classical frames are kept even if the depth and mask are unavailable.

Meta’s Contents

Key	Description
`'frames'`	Array of dictionaries with one entry per ground truth sampling period Each dictionary contains the pose of each object and the camera See here for the transform conventions A special object id `'cam'` is used for the camera `'gt_frame'`/`'classical_frame'` denote the pose’s ground truth/classical .png files `'id'` denotes indies to the ground truth/classical frame in the NPZ format
`'full_trajectory'`	Array of dictionaries with one entry per Vicon pose measurement (200Hz)
`imu`	Dictionary of arrays of IMU samples One array per IMU on the EVIMO2 camera rig No IMU data is available for EVIMO
`'meta'`	Camera intrinsics and time offset from the corresponding ROS bag file. See here for details.

Note: The 'vel' key is available in older versions of the dataset, it should be ignored.

An example of the contents of meta.npy or meta.txt is given below. In either case, the structure is a Python dictionary and the text version can be read with Python’s eval() method as shown here.

For brevity, only a single element of the 'frames', 'full_trajectory', and IMU arrays are shown.

{'frames': [{'11': {'pos': {'q': {'w': 0.27729, 'x': 0.145438, 'y': -0.40157, 'z': 0.860639},
                            'rpy': {'p': -0.49274, 'r': -0.765635, 'y': 2.720059},
                            't': {'x': -0.005627, 'y': 0.338652, 'z': 1.391626}},
                    'ts': 0.031794},
             '14': {'pos': {'q': {'w': 0.507373, 'x': 0.528164, 'y': 0.658874, 'z': 0.171756},
                            'rpy': {'p': 0.508834, 'r': 2.080551, 'y': 1.487373},
                            't': {'x': -0.204377, 'y': 0.395282, 'z': 1.558496}},
                    'ts': 0.031794},
             '15': {'pos': {'q': {'w': 0.094314, 'x': -0.133898, 'y': 0.853899, 'z': 0.493997},
                            'rpy': {'p': 0.29774, 'r': 2.114005, 'y': -2.999389},
                            't': {'x': -0.016263, 'y': 0.39068, 'z': 1.34646}},
                    'ts': 0.031794},
             '22': {'pos': {'q': {'w': 0.390027, 'x': 0.751193, 'y': -0.092049, 'z': -0.524514},
                            'rpy': {'p': 0.79837, 'r': 1.780868, 'y': -0.901796},
                            't': {'x': -0.039562, 'y': 0.17172, 'z': 1.283998}},
                    'ts': 0.031794},
             '23': {'pos': {'q': {'w': -0.315782, 'x': 0.541274, 'y': 0.711081, 'z': 0.318855},
                            'rpy': {'p': -0.917802, 'r': 2.956836, 'y': 1.931814},
                            't': {'x': -0.128985, 'y': -0.108972, 'z': 0.90992}},
                    'ts': 0.031794},
             '24': {'pos': {'q': {'w': -0.302069, 'x': -0.559534, 'y': 0.146147, 'z': 0.757837},
                            'rpy': {'p': 0.862972, 'r': 1.036446, 'y': -1.869526},
                            't': {'x': 0.232299, 'y': -0.171992, 'z': 0.765111}},
                    'ts': 0.031794},
             '5': {'pos': {'q': {'w': 0.897532, 'x': -0.232753, 'y': -0.370827, 'z': -0.052436},
                           'rpy': {'p': -0.761582, 'r': -0.551012, 'y': 0.108671},
                           't': {'x': 0.24287, 'y': -0.042829, 'z': 0.686142}},
                   'ts': 0.031794},
             '6': {'pos': {'q': {'w': 0.629238, 'x': -0.435374, 'y': -0.09361, 'z': -0.636982},
                           'rpy': {'p': -0.737523, 'r': -0.61769, 'y': -1.337676},
                           't': {'x': 0.091899, 'y': -0.008143, 'z': 1.013714}},
                   'ts': 0.031794},
             'cam': {'pos': {'q': {'w': 1.0, 'x': 3.6e-05, 'y': 0.000342, 'z': 0.000158},
                             'rpy': {'p': 0.000683, 'r': 7.2e-05, 'y': 0.000316},
                             't': {'x': -0.000103, 'y': -0.000202, 'z': 2.9e-05}},
                     'ts': 0.031794},
             'classical_frame': 'img_0.png',
             'gt_frame': 'depth_mask_0.png',
             'id': 0,
             'ts': 0.031794}],
 'full_trajectory': [{'11': {'pos': {'q': {'w': 0.27728, 'x': 0.145302, 'y': -0.401452, 'z': 0.86072},
                                     'rpy': {'p': -0.492419, 'r': -0.765375, 'y': 2.719923},
                                     't': {'x': -0.005162, 'y': 0.338606, 'z': 1.391609}},
                             'ts': 0.017647},
                      '14': {'pos': {'q': {'w': 0.507253, 'x': 0.527763, 'y': 0.659383, 'z': 0.17139},
                                     'rpy': {'p': 0.509845, 'r': 2.081517, 'y': 1.488579},
                                     't': {'x': -0.203892, 'y': 0.395231, 'z': 1.558574}},
                             'ts': 0.017647},
                      '15': {'pos': {'q': {'w': 0.093911, 'x': -0.133816, 'y': 0.853955, 'z': 0.493998},
                                     'rpy': {'p': 0.296946, 'r': 2.11404, 'y': -2.999136},
                                     't': {'x': -0.015858, 'y': 0.390641, 'z': 1.346457}},
                             'ts': 0.017647},
                      '22': {'pos': {'q': {'w': 0.390079, 'x': 0.751092, 'y': -0.092001, 'z': -0.524628},
                                     'rpy': {'p': 0.798503, 'r': 1.780428, 'y': -0.90209},
                                     't': {'x': -0.039156, 'y': 0.171642, 'z': 1.284023}},
                             'ts': 0.017647},
                      '23': {'pos': {'q': {'w': -0.31792, 'x': 0.540927, 'y': 0.710404, 'z': 0.318828},
                                     'rpy': {'p': -0.921696, 'r': 2.960192, 'y': 1.930286},
                                     't': {'x': -0.128655, 'y': -0.10911, 'z': 0.90998}},
                             'ts': 0.017647},
                      '24': {'pos': {'q': {'w': -0.302236, 'x': -0.559342, 'y': 0.146205, 'z': 0.757901},
                                     'rpy': {'p': 0.862507, 'r': 1.036064, 'y': -1.869722},
                                     't': {'x': 0.232548, 'y': -0.172056, 'z': 0.765072}},
                             'ts': 0.017647},
                      '5': {'pos': {'q': {'w': 0.897617, 'x': -0.232821, 'y': -0.370598, 'z': -0.052295},
                                    'rpy': {'p': -0.761021, 'r': -0.551155, 'y': 0.108874},
                                    't': {'x': 0.243082, 'y': -0.04289, 'z': 0.68607}},
                            'ts': 0.017647},
                      '6': {'pos': {'q': {'w': 0.62919, 'x': -0.435517, 'y': -0.093582, 'z': -0.636936},
                                    'rpy': {'p': -0.737654, 'r': -0.618076, 'y': -1.33747},
                                    't': {'x': 0.092225, 'y': -0.008196, 'z': 1.013705}},
                            'ts': 0.017647},
                      'cam': {'pos': {'q': {'w': 1.0, 'x': 6.1e-05, 'y': 0.000162, 'z': 0.000107},
                                      'rpy': {'p': 0.000324, 'r': 0.000122, 'y': 0.000214},
                                      't': {'x': -6.6e-05, 'y': -9e-05, 'z': 8e-06}},
                              'ts': 0.017647},
                      'gt_frame': 'depth_mask_18446744073709551615.png',
                      'id': 18446744073709551615,
                      'ts': 0.017647}],
 'imu': {'/prophesee/left/imu': [{'angular_velocity': {'x': -0.001065, 'y': 0.011984, 'z': 0.020639},
                                  'linear_acceleration': {'x': 8.293953, 'y': 0.349074, 'z': -5.420528},
                                  'ts': 0.017303}],
         '/prophesee/right/imu': [{'angular_velocity': {'x': -0.036885, 'y': 0.021438, 'z': -0.006924},
                                   'linear_acceleration': {'x': -8.361612, 'y': 0.078437, 'z': -4.883445},
                                   'ts': 0.017319}]},
 'meta': {'cx': 1053.709961,
          'cy': 788.531982,
          'dist_model': 'radtan',
          'fx': 2066.48999,
          'fy': 2066.469971,
          'k1': -0.117189,
          'k2': 0.069793,
          'k3': 0.0,
          'k4': 0.0,
          'p1': -0.000327,
          'p2': 0.005909,
          'res_x': 2080,
          'res_y': 1552,
          'ros_time_offset': 1606265321.445115}}

Transform Convention

Object poses represent transforms from the object frame to the camera frame.
Camera poses represent transforms from the camera frame to the world frame.

Format Comparison

EVIMO2v1 vs EVIMO2v2

Many important improvements to the data generation pipeline that transforms raw recordings to released data were made in order to release EVIMO2v2. These changes include:

All camera’s data’s timestamps are synchronized to a common source (Vicon)
Event camera ground truths are synchronized and jitter is eliminated up to numercial precision
Classical camera’s ground truth jitter is eliminated up to numerical precision
Classical camera’s image data is kept when there is no ground truth due to Vicon occlusion
A redesigned NPZ format greatly reduces loading time without requiring excessive disk usage
Events are not filtered when copied out of the raw recordings
Several sequences with no usable data were deleted
Sequences were split into several sub-sequences satisfying:
- Gaps in ground truth depth/mask have a duration of at most 1 second
- All sub-sequences are at least 0.4 seconds long

EVIMO2v2 TXT vs EVIMO2v2 NPZ

Decompressed, the EVIMO2v2 NPZ format requires 525 GB of space of which 271 GB is for the RGB camera and 255 GB is for the three DVS cameras.

Decompressed, the TXT format requires 842 GB of space of which 212 GB is for the RGB camera and 631 GB is for the three DVS cameras.

The EVIMO2v2 NPZ format supports memory mapping the event arrays from disk.

The TXT format requires parsing the events.txt file, which can tens of seconds per sequence.

The TXT format decompresses to about 471,000 files. The npz format decompresses to 4,700 files.

EVIMO, EVIMO2v1 NPZ vs EVIMO2v2 NPZ

EVIMO and EVIMO2v1 compressed the entire dataset into a single .npz file. Inside this .npz there are archives for the depth, masks, events, and conventional images. To access any of this data the entire depth, mask, events, or conventional archives must be decompressed into RAM (by numpy) or onto disk (manually by the user). This can take several minutes for each sequence and requires the user to either have upwards of 64GB of RAM (to hold a decompressed sequence) or multiple TB of disk space (to store all sequences decompressed at once).

EVIMO2v2 introduces a new NPZ format with two major changes. These changes eliminate long sequence load times and while preventing excessive disk usage. First, depth, masks, and conventional images are compressed frame by frame so that only the frame being currently used needs to be decompressed and stored in RAM. Second, events are stored uncompressed with the minimum width data type. Because the events are stored in a decompressed format, the arrays can be memory mapped so that only the portions of the events currently in use need to be stored in RAM.