Promotie
Data association for building world models in agriculture
Samenvatting (Engelstalig)
The agricultural sector faces growing pressure to produce more food with less labor. Robots and automation can help, but agricultural environments are complex, cluttered, and highly dynamic. For a robot to work well in such environments, it needs to understand its surroundings by building an internal representation of the world called a world model. This thesis focuses on advancing symbolic and neuro-symbolic world models, which offer greater interpretability and verifiability compared to purely sub-symbolic approaches. The main challenge addressed is Multi-Object Tracking (MOT), particularly the data association problem, which is essential in a world model for maintaining persistent object identities over time amidst significant occlusion, appearance variation and sensor noise.
This research presents a progression of novel MOT methodologies designed for agricultural applications. It begins by establishing a baseline symbolic tracker using Kalman filters and geometric features, demonstrating the inherent limitations of such approaches in cluttered conditions (Chapter 2). To overcome these limitations, a neuro-symbolic method, MinkSORT, is introduced, which leverages sparse 3D convolutional networks to learn discriminative feature embeddings from point cloud data. This significantly improves data association by combining learned 3D features with positional information, enhancing tracking robustness against occlusion (Chapter 3). Further advancing the state-of-the-art, the thesis presents MOT-DETR, an integrated, transformer-based architecture that performs simultaneous 3D object detection and tracking. By fusing multi-modal (RGB and 3D) data, MOT-DETR demonstrates superior performance in scenarios with challenging camera motion and severe occlusions compared to two-stage methods (Chapters 4 and 5). The principles of robust data association are then scaled to large-scale environmental mapping with Tree-SLAM. This symbolic system integrates a novel data association algorithm with factor graph optimization to create accurate, geo-referenced orchard maps by using tree trunks as landmarks, achieving high precision even with unreliable GPS data (Chapter 6).
In conclusion, this thesis makes significant contributions to robust perception for agricultural robotics by developing a series of state-of-the-art MOT systems. The findings underscore the important role of robust data association strategies and the efficacy of learned 3D and multi-modal features in building accurate world models. This work advances the development of more capable and autonomous agricultural systems, addressing the main challenges in modern food production.