Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU

In this paper, we question the premise that graphics hardware uses a post-transform cache to avoid redundant vertex shader invocations. A large body of existing work on optimizing indexed triangle sets for rendering speed is based upon this widely-accepted assumption. We conclusively show that this...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the ACM on computer graphics and interactive techniques 2018-08, Vol.1 (2), p.1-16
Hauptverfasser: Kerbl, Bernhard, Kenzel, Michael, Ivanchenko, Elena, Schmalstieg, Dieter, Steinberger, Markus
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we question the premise that graphics hardware uses a post-transform cache to avoid redundant vertex shader invocations. A large body of existing work on optimizing indexed triangle sets for rendering speed is based upon this widely-accepted assumption. We conclusively show that this assumption does not hold up on modern graphics hardware. We design and conduct experiments that demonstrate the behavior of current hardware of all major vendors to be inconsistent with the presence of a common post-transform cache. Our results strongly suggest that modern hardware rather relies on a batch-based approach, most likely for reasons of scalability. A more thorough investigation based on these initial experiments allows us to partially uncover the actual strategies implemented on graphics processors today. We reevaluate existing mesh optimization algorithms in light of these new findings and present a new mesh optimization algorithm designed from the ground up to target architectures that rely on batch-based vertex reuse. In an extensive evaluation, we measure and compare the real-world performance of various optimization algorithms on modern hardware. Our results show that some established algorithms still perform well. However, if the batching strategy of the target architecture is known, our approach can significantly outperform these previous state-of-the-art methods.
ISSN:2577-6193
2577-6193
DOI:10.1145/3233302