BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160908Z
LOCATION:D174
DTSTART;TZID=America/Chicago:20181116T083000
DTEND;TZID=America/Chicago:20181116T120000
UID:submissions.supercomputing.org_SC18_sess146@linklings.com
SUMMARY:Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS)
DESCRIPTION:Workshop\nResiliency, Scientific Computing, Workshop Reg Pass\
 n\nImproving Application Resilience by Extending Error Correction with Con
 textual Information\n\nPoulos, Wallace, Robey, Monroe, Job...\n\nExtreme-s
 cale systems are growing in scope and complexity as we approach exascale. 
 Uncorrectable faults in such systems are also increasing, so resilience ef
 forts addressing these are of great importance. In this paper, we extend a
  method that augments hardware error detection and correction (EDAC) ...\n
 \n---------------------\nIntroduction - Workshop on Fault-Tolerance for HP
 C at Extreme Scale (FTXS)\n\nDeBardeleben, Levy, Teranishi, Daly\n\nAddres
 sing failures in extreme-scale systems remains a significant challenge to 
 reaching exascale.  Current projections suggest that at the scale necessar
 y to sustain exaflops of computation, systems could experience failures as
  frequently as once per hour.  As a result, robust and efficient fault t..
 .\n\n---------------------\nInfluence of A-Posteriori Subcell Limiting on 
 Fault Frequency in Higher-Order DG Schemes\n\nReinarz, Gallard, Bader\n\nS
 oft error rates are increasing as modern architectures require increasingl
 y small features at low voltages. Due to the large number of components us
 ed in HPC architectures, these are particularly vulnerable to soft errors.
  Hence, when designing applications that run for long time periods on larg
 e m...\n\n---------------------\nExtending and Evaluating Fault-Tolerant P
 reconditioned Conjugate Gradient Methods\n\nPachajoa, Levonyak, Gansterer\
 n\nWe compare and refine exact and heuristic fault-tolerance extensions fo
 r the preconditioned conjugate gradient (PCG) and the split preconditioner
  conjugate gradient (SPCG) methods for recovering from failures of compute
  nodes of large-scale parallel computers. In the exact state reconstructio
 n (ESR)...\n\n---------------------\nWorkshop Morning Break\n\nLevy\n\n---
 ------------------\nA Comprehensive Informative Metric for Analyzing HPC S
 ystem Status Using the LogSCAN Platform\n\nHui, Park, Engelmann\n\nLog pro
 cessing by Spark and Cassandra-based ANalytics (LogSCAN) is a newly develo
 ped analytical platform that provides flexible and scalable data gathering
 , transformation and computation. One major challenge is to effectively su
 mmarize the status of a complex computer system, such as the Titan supe...
 \n\n---------------------\nAnalyzing the Impact of System Reliability Even
 ts on Applications in the Titan Supercomputer\n\nAshraf, Engelmann\n\nExtr
 eme-scale computing systems employ Reliability, Availability and Serviceab
 ility (RAS) mechanisms and infrastructure to log events from multiple syst
 em components. In this paper, we analyze RAS logs in conjunction with the 
 application placement and scheduling database, in order to understand the 
 ...\n\n---------------------\nFault Tolerant Cholesky Factorization on GPU
 s\n\nLoh, Saluja, Ramanathan\n\nDirect Cholesky-based solvers are typicall
 y used to solve large linear systems where the coefficient matrix is symme
 tric positive definite. These solvers offer faster performance in solving 
 such linear systems, compared to other more general solvers such as LU and
  QR solvers. In recent days, graphic...\n\n---------------------\nCPU Over
 heating Characterization in HPC Systems: a Case Study\n\nPlatini, Ropars, 
 Pelletier, De Palma\n\nWith the increase in size of supercomputers, the nu
 mber of abnormal events also increases. Some of these events might lead to
  an application failure. Others might simply impact the system efficiency.
  CPU overheating is one such event that decreases the system efficiency: w
 hen a CPU overheats, it red...\n\n---------------------\nToward Ad Hoc Rec
 overy For Soft Errors\n\nLosada, Bautista-Gomez, Keller, Unsal\n\nThe comi
 ng exascale era is a great opportunity for high performance computing (HPC
 ) applications. However, high failure rates on these systems will hazard t
 he successful completion of their execution. Bit-flip errors in dynamic ra
 ndom access memory (DRAM) account for a noticeable share of the failur...\
 n\n---------------------\nSaNSA - the Supercomputer and Node State Archite
 cture\n\nAgarwal, Greenberg, Blanchard, DeBardeleben\n\nIn this work, we p
 resent SaNSA, the Supercomputer and Node State Architecture, a software in
 frastructure for historical analysis and anomaly detection. SaNSA consumes
  data from multiple sources including system logs, the resource manager, s
 cheduler, and job logs. Furthermore, additional context such...\n
END:VEVENT
END:VCALENDAR

