BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160905Z
LOCATION:C2/3/4 Ballroom
DTSTART;TZID=America/Chicago:20181114T083000
DTEND;TZID=America/Chicago:20181114T170000
UID:submissions.supercomputing.org_SC18_sess323_post259@linklings.com
SUMMARY:Which Architecture Is Better Suited for Matrix-Free Finite-Element
  Algorithms: Intel Skylake or Nvidia Volta?
DESCRIPTION:Poster\nTech Program Reg Pass, Exhibits Reg Pass\n\nWhich Arch
 itecture Is Better Suited for Matrix-Free Finite-Element Algorithms: Intel
  Skylake or Nvidia Volta?\n\nKronbichler, Allalen, Ohlerich, Wall\n\nThis 
 work presents a performance comparison of highly tuned matrix-free finite 
 element kernels from the finite element library on different contemporary 
 computer architectures, NVIDIA V100 and P100 GPUs, an Intel Knights Landin
 g Xeon Phi, and two multi-core Intel CPUs (Broadwell and Skylake).  The al
 gorithms are based on fast integration on hexahedra using sum factorizatio
 n techniques.  For small problem sizes, when all data fits into CPU caches
 , Skylake is very competitive with Volta. For larger sizes, however, the G
 PU holds an advantage of approximately a factor of three over Skylake, bec
 ause all architectures operate in the memory-bandwidth limited regime. A d
 etailed performance analysis contrasts the throughput-oriented character o
 f GPUs versus the more latency-optimized CPUs for the scenario of high-ord
 er finite element computations.
URL:https://sc18.supercomputing.org/presentation/?id=post259&sess=sess323
END:VEVENT
END:VCALENDAR

