A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM
While the concept of online profile directed dynamic optimizations using hardware performance monitoring unit (PMU) data is not new, it has seen fairly limited or no use in commercial JVMs. The main reason behind this fact is the set of significant challenges involved in (1) obtaining low overhead a...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While the concept of online profile directed dynamic optimizations using hardware performance monitoring unit (PMU) data is not new, it has seen fairly limited or no use in commercial JVMs. The main reason behind this fact is the set of significant challenges involved in (1) obtaining low overhead and usable profiling support from the underlying platform (2) the complexity of filtering, interpreting and using precise PMU events online in a JVM environment (3) demonstrating the total runtime benefit of PMU data based optimizations above and beyond regular online profile based optimizations. In this paper we address all three challenges by presenting a practical framework for PMU data collection and use within a high performance product JVM on a highly scalable server platform. Our experiments with JavaTM workloads using the SunTM HotspotTM JDK 1.6 JVM on the Intel ® Itanium ® platform indicate that the hardware data collection overhead (less than 0.5%) is not as significant as the challenge of extracting the precise information for optimization purposes. We demonstrate the feasibility of mapping the instruction IP address based hardware event information to the runtime components as well as the JIT server compiler internal data structures for use in optimizations within a dynamic environment. We also evaluated the additional performance potential of optimizations such as object co-location during garbage collection and global instruction scheduling in the JIT compiler with the use of PMU generated load latency information. Experimental results show performance improvements of up to 14% with an average of 2.2% across select Java server benchmarks such as SPECjbb2005[16], SPECjvm2008[17] and Dacapo[18]. These benefits were observed over and above those provided by profile guided server JVM optimizations in the absence of hardware PMU data. |
---|---|
DOI: | 10.1109/CGO.2009.13 |