BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160904Z
LOCATION:C2/3/4 Ballroom
DTSTART;TZID=America/Chicago:20181115T083000
DTEND;TZID=America/Chicago:20181115T170000
UID:submissions.supercomputing.org_SC18_sess324_post149@linklings.com
SUMMARY:GPU Acceleration at Scale with OpenPower Platforms in Code_Saturne
DESCRIPTION:Poster\nTech Program Reg Pass, Exhibits Reg Pass\n\nGPU Accele
 ration at Scale with OpenPower Platforms in Code_Saturne\n\nAntao, Mouline
 c, Fournier, Sawko, Zimon...\n\nCode_Saturne is a widely used computationa
 l fluid dynamics software package that uses finite-volume methods to simul
 ate different kinds of flows tailored to tackle multi-bilion-cell unstruct
 ured mesh simulations. This class of codes has shown to be challenging to 
 accelerate on GPUs as they consist of many kernels and regular inter-proce
 ss communication in between. In this poster we show how template pack expa
 nsion with CUDA can combine multiple kernels into a single one reducing la
 unching latencies and along with the specification of data environments he
 lp reduce host-device communication. We tested these techniques on ORNL Su
 mmit Supercomputer based on OpenPOWER platform delivering almost 3x speedu
 p over CPU-only runs on 256 nodes. We also show how the latest generation 
 NVLINK(TM) interconnect available in POWER9(TM)improves scaling efficiency
 , enabling consistent GPU acceleration with just 100K-cells per process.
URL:https://sc18.supercomputing.org/presentation/?id=post149&sess=sess324
END:VEVENT
END:VCALENDAR