Improving LMM Visual Reasoning Through Iterative Self-Synthesis and Expert-Guided Feature Selection
This work introduces a novel methodology for multimodal foundation models to self-synthesize training data that enhances both their cognitive capabilities and explainability. The core technique involves generating synthetic data through recursive self-…