| Bibliography | Schmid, Jakob: Multi-scale Scene Flow Estimation. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 135 (2024). 81 pages, english.
|
| Abstract | Scene flow is a fundamental task in computer vision. The task consists of the extraction of depth and 3D motion information from a sequence of stereoscopic images. Multi-scale concepts have a long history in motion estimation. For optical flow, the estimation of 2D motion with monocular images, multi-scale approaches based on the RAFT method (Teed and Deng ECCV 2020) yield good results. To reduce memory and computational cost, RAFT only operates at 1/8 of the input image resolution. MS-RAFT (Jahedi et al. ICIP 2022) is a multi-scale method based on RAFT that uses a coarse-to-fine scheme for the motion estimation. The resolution of the flow estimation is initially 1/16 of the input image and is gradually increased to 1/4. Due to the higher resolution at the finest scale, the method achieves a higher level of detail than RAFT. RAFT-3D (Teed and Deng CVPR 2021) is a RAFT-based scene flow method that also estimates flow at 1/8 of the input image resolution. By applying the coarse-to-fine estimation scheme from MS-RAFT to RAFT-3D, we build the multi-scale scene flow method MS-RAFT-3D. The estimation process is supported by a multi-scale feature encoder that extracts features from the images at different resolutions. We mainly focus on a 3-scale variant, which operates at 1/16, 1/8 and 1/4 of the image resolution, but we also build a 4-scale variant, which additionally operates at a resolution of 1/2 and thus offers potential for even more detailed motion estimation. In addition to the multi-scale concept, we replace the stereo depth estimation method with a better performing alternative to make use of the recent advancements in that area. Our results confirm the usefulness of multi-scale concepts for modern scene flow methods. In our experiments, the 3-scale and 4-scale MS-RAFT-3D methods have a higher accuracy than the single-scale RAFT-3D method. On the KITTI and the Spring benchmark, we achieve state-of-the-art results, outperforming all other published approaches.
|
| Department(s) | University of Stuttgart, Institute of Visualisation and Interactive Systems, Visualisation and Interactive Systems
|
| Superviser(s) | Bruhn, Prof. Andrés; Jahedi, Azin |
| Entry date | December 19, 2025 |
|---|