BUsy Street Scenes (BUSS) is a challenging dataset of video sequences taken from a handheld mobile phone (an OPPO A5 2020 smartphone, rear camera) in crowded city streets with synchronized inertial measurement unit (IMU) data. The goal of the dataset is to evaluate the robustness of camera rotation estimation algorithms in dense and dynamic scenes with many moving objects and complex camera motion. The dataset composes 17 video sequences of about 10 seconds each at 30 FPS in full HD resolution (1920x1080) RGB. We used the Android Open-Camera Sensor app to synchronously record video and angular rate from the phone’s MEMS gyroscopes (at 400Hz) and then generated the rotation ground truth.
The ground truth rotation at frame f_t
represents the forward rotation from the video frame f_t
to the immediate next frame f_{t+1}
. To get the rotation between two frames, we numerically integrate angular rate measurements. The ground truths are represented as quaternions.
Our videos are recorded using the OPPO's rear camera at full HD (1920x1080) @30Hz in RGB.
Resolution | 12.0 MP (1920x1080) |
Aperture | f/1.8 |
ISO | 90-1600 |
Shutter type | Rolling shutter |
We used an 8x6 checkboard (25mm squares) and the Matlab tool to calibrate the intrinsic parameters of the OPPO rear camera:
Focal length (pixels)
Focal length in x and y, represented as a two-element vector [fx fy] in pixels. |
[1.5980e+03, 1.5764e+03] |
Principal point (pixels)
Coordinates of the optical center of camera, represented as a two-element vector [cx cy] in pixels. |
[969.1056, 514.1442] |
Image size
Image size produced by the camera, represented as a two-element vector, [mrows ncols]. |
[1080, 1920] |
Radial distortion
Represented as a three-element vector [k1 k2 k3] which are the radial distortion coefficients of the lens. |
[0.2679, -1.1773, 1.6158] |
Tangential distortion
Represented as a two-element vector [p1 p2] which are the tangential distortion coefficients of the lens. |
[-0.0084, 0.0019] |
Skew
Skew of the camera axes, a scalar value. The skew is 0 when the |
1.0089 |
For a better decription of the camera intrinsic properties, please refer to this Matlab documentation
The sensor's coordinate system is defined relative to the phone's screen and follows the right-hand convention. The coordinate system is fixed and does not change with screen display orientation. Only the gyroscope was utilized.
Model name | lsm6ds3c |
Average update rate | 400Hz |
Our experiments show a strong agreement between the rotation velocity of two different gyroscopes. It is highly unlikely for the two sensors to agree if their measurements were incorrect. Therefore, this strongly suggest that the gyroscope measurements is a good ground truth for frame-to-frame rotation estimation. For more details, please refer to the supplemental material.
To meet strict privacy standards, videos are only captured in public places, and faces and other personally identifiable information (PII) is blurred.
When using this dataset in your research, please cite:
@article{Delattre2023RobustRotation,
author = {Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael J. Jones, Pedro Miraldo, Erik Learned-Miller},
title = {Robust frame-to-frame camera rotation estimation in crowded scenes},
journal = {2023 IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2023}
}