An efficient asymmetric distributed lock for embedded multiprocessor systems

Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Rutgers, J. H., Bekooij, M. J. G., Smit, G. J. M.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Bandwidth Hardware SDRAM Servers Synchronization System-on-a-chip Tiles
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are doubts about the scalability of hardware cache coherency protocols. Existing distributed locks for NUMA multiprocessor systems do not rely on cache coherency and are more scalable, but exchange many messages per lock. This paper introduces an asymmetric distributed lock algorithm for shared-memory embedded multiprocessor systems without hardware cache coherency. Messages are exchanged via a low-cost inter-processor communication ring in combination with a small local memory per processor. Typically, a mutex is used over and over again by the same process, which is exploited by our algorithm. As a result, the number of messages exchanged per lock is significantly reduced. Experiments with our 32-core system show that when having locks in SDRAM, 35% of the memory traffic is lock related. In comparison, our solution eliminates all of this traffic and reduces the execution time by up to 89%.
DOI:	10.1109/SAMOS.2012.6404172