Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions

The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models&...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Complex & intelligent systems 2025, Vol.11 (1), p.110
Hauptverfasser: Kuang, Xinhe, Che, Yuxin, Han, Huiyan, Liu, Yimin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.
ISSN:2199-4536
2198-6053
DOI:10.1007/s40747-024-01746-z