[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JIT pluggability (ABI Issues)

From: Sascha Brawer
Subject: Re: JIT pluggability (ABI Issues)
Date: Tue, 13 Jan 2004 12:59:40 +0100

CG = Chris Gray <address@hidden> wrote:
PR = Patrik Reali <address@hidden> wrote:

PR> * class layout (do we really need this? I guess the fields are
PR> allocated in the row as they are declared)

CG> Some VMs may group all reference fields together, or try to
CG> "pack" fields smaller than 16 bits (boolean, byte, short, char).

Please apologize for this kind of off-topic post, but in case anyone was
interested in the subject, please find below a few references to
background reading. There surely exist plenty more papers in this area.


[Chilimbi et al., 1999] Trishul M. Chilimbi, Bob Davidson, and James R.
Larus. Cache-Conscious Structure Definition. In Proceedings of the ACM
SIGPLAN Conference on Programming Language Design and Implementation
(PLDI '99), May 1999, pp. 13--24.

(a) Class Splitting separates the fields of eligible classes into a
frequently and a rarely accessed part, based on instrumentation data. The
optimization is applicable to classes whose size roughly equals a cache
block (common for Java programs), provided enough variation in field
access frequency. Because of fewer cache misses, the performance of five
Java programs has improved by 6 -- 18%. -- (b) Field Reordering could
further improve the performance of Microsoft SQL server by 3%, despite
previous cache-concious C programming. Illustrates that structure layout
is better left to the compiler.


[Dolby and Chien, 2000] Julian Dolby and Andrew A. Chen. An Automatic
Object Inlining Optimization and its Evaluation. In Proceedings of the
ACM SIGPLAN Conference on Programming Language Design and Implementation
(PLDI 2000), Vancouver BC, Canada, June 2000, pp. 345 - 357.

Object Inlining transforms heap data structures by fusing parent and
child objects together. It can speed up a program by reducing pointer
dereferencing and by improving memory access behavior. With the benchmark
programs, 30% of objects could be inlined, leading to 12% fewer loads,
25% fewer L1 cache misses and 25% fewer read stalls.


[Kistler and Franz, 2000] Thomas Kistler and Michael Franz. Automated
Data-Member Layout of Heap Objects to Improve Memory-Hierarchy
Performance. ACM Transactions on Programming Languages and Systems
(TOPLAS) 22, No. 3, May 2000. pp. 490--505.

Instrumentation data is used to build a weighted graph whose edges
represent temporal dependencies between fields. In order to assign fields
to cache lines, the graph is partitioned; a second step orders the fields
within each partition. This Field reordering improved the performance of
six Oberon programs by 3 to 50%.


[Shuf et al., 2001] Yefim Shuf, Mauricio J. Serrano, Manish Gupta, and
Jaswinder Pal Singh. Characterizing the Memory Behavior of Java
Workloads: A Structured View and Opportunities for Optimizations. In
Proceedings of SIGMETRICS 2001/Performance 2001, Cambridge MA, USA, June 2001.

Draws conclusions from detailed instrumentation of the SPEC JVM98 and
JBB2000 benchmark suites, running on a modified version of the JalapeƱo
VM. The L1 data cache is less effective than for C/C++ desktop workloads
(4% misses, compared to 1%). Object Inlining could partially fix the
problem caused by "pointer chasing." Field re-ordering is unlikely to
increase L1-cache performance for Java, because most "hot" objects fit
into a 32-byte cache line. While Data Prefetching could mitigate the L1
cache situation, TLB misses are frequent as well, and current hardware
ignores prefetching instructions if the fetched address is not in the
TLB. To increase TLB hit rate, large VM pages should be used, and class
data should be co-located (because virtual method tables contribute
noticeably to TLB misses).

-- Sascha

Sascha Brawer, address@hidden,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]