Project Overview
- Problem Definition: Recognising similar-looking manufacturing parts using only 3D CAD models, without real annotated image datasets.
- Simulation Environment: Developed a Unity-based setup replicating a real-world multi-camera rig to render synthetic images from 3D models.
- Data Augmentation: Adopted domain randomisation, including background textures, lighting variation, geometric transformations, and motion blur to enhance generalisation to real-world images.
- One-Stage Approach (Baseline): Adapted MVCNN architecture with three ResNet18 branches for multi-view feature extraction and part classification using synthetic data.
- Two-Stage Architecture: Introduced a scalable two-stage network:
- Stage 1: Predicts cluster ID from pseudo-labelled groups of parts (e.g. 20 clusters).
- Stage 2: A specialised classifier per cluster refines the part classification.
- Training Details: All networks trained using Adam optimiser for 40 epochs, with weight decay and learning rate scheduling. Pretrained ResNet18 backbones used in all branches.
- Qualitative Results: Visual tests show the two-stage model is more robust to occlusion, lighting variation, and minor part differences.