BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160728Z
LOCATION:D161
DTSTART;TZID=America/Chicago:20181112T163000
DTEND;TZID=America/Chicago:20181112T165000
UID:submissions.supercomputing.org_SC18_sess158_ws_lasalss110@linklings.co
 m
SUMMARY:A General-Purpose Hierarchical Mesh Partitioning Method with Node 
 Balancing Strategies for Large-Scale Numerical Simulations
DESCRIPTION:Workshop\nAlgorithms, Heterogeneous Systems, Resiliency, Works
 hop Reg Pass\n\nA General-Purpose Hierarchical Mesh Partitioning Method wi
 th Node Balancing Strategies for Large-Scale Numerical Simulations\n\nKong
 , Stogner, Gaston, Peterson, Permann...\n\nLarge-scale parallel numerical 
 simulations are essential for a wide range of engineering problems\nthat
  involve complex, coupled physical processes interacting across a broad ra
 nge of spatial\nand temporal scales. The data structures involved in suc
 h simulations (meshes, sparse matrices, etc.) are frequently represented a
 s graphs, and these graphs must be optimally partitioned across the availa
 ble computational resources in order for the underlying calculations to sc
 ale efficiently. Partitions which minimize the number of graph edges that 
 are cut (edge-cuts) while simultaneously maintaining a balance in the amou
 nt of work (i.e. graph nodes) assigned to each processor core are desirabl
 e, and the performance of most existing partitioning software begins to de
 grade in this metric for partitions with more than than $O(10^3)$ processo
 r cores. In this work, we consider a general-purpose hierarchical partitio
 ner which takes into account the existence of multiple processor cores and
  shared memory in a compute node while partitioning a graph into an arbitr
 ary number of subgraphs. We demonstrate that our algorithms significantly 
 improve the preconditioning efficiency and overall performance of realisti
 c  numerical simulations running on up to 32,768 processor cores with near
 ly $10^9$ unknowns.
URL:https://sc18.supercomputing.org/presentation/?id=ws_lasalss110&sess=se
 ss158
END:VEVENT
END:VCALENDAR