BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160726Z
LOCATION:D170
DTSTART;TZID=America/Chicago:20181111T115200
DTEND;TZID=America/Chicago:20181111T121400
UID:submissions.supercomputing.org_SC18_sess149_ws_mchpc114@linklings.com
SUMMARY:A Preliminary Study of Compiler Transformations for Graph Applicat
 ions on the Emu System
DESCRIPTION:Workshop\nMemory, NVRAM, Parallel Programming Languages, Libra
 ries, and Models, Workshop Reg Pass\n\nA Preliminary Study of Compiler Tra
 nsformations for Graph Applications on the Emu System\n\nChatarasi, Sarkar
 \n\nUnlike dense linear algebra applications, graph applications typically
  suffer from poor performance because of 1) inefficient utilization of mem
 ory systems through random memory accesses to graph data, and 2) overhead 
 of executing atomic operations. Hence, there is a rapid growth in improvin
 g both software and hardware platforms to address the above challenges. On
 e such improvement in the hardware platform is a realization of the Emu sy
 stem, a thread migratory and near-memory processor. In the Emu system, a t
 hread responsible for computation on a datum is automatically migrated ove
 r to a node where the data resides without any intervention from the progr
 ammer. The idea of thread migrations is very well suited to graph applicat
 ions as memory accesses of the applications are irregular. However, thread
  migrations can hurt the performance of graph applications if overhead fro
 m the migrations dominates benefits achieved through the migrations.\n\nIn
  this preliminary study, we explore two high-level compiler optimizations,
  i.e., loop fusion and edge flipping, and one low-level compiler transform
 ation leveraging hardware support for remote atomic updates to address ove
 rheads arising from thread migration, creation, synchronization, and atomi
 c operations. We performed a preliminary evaluation of these compiler tran
 sformations by manually applying them on three graph applications over a s
 et of RMAT graphs from Graph500.—Conductance, Bellman-Ford’s algorithm for
  the single-source shortest path problem, and Triangle Counting. Our evalu
 ation targeted a single node of the Emu hardware prototype, and has shown 
 an overall geometric mean reduction of 22.08% in thread migrations.
URL:https://sc18.supercomputing.org/presentation/?id=ws_mchpc114&sess=sess
 149
END:VEVENT
END:VCALENDAR

