Publications [Complete List]

Book Chapters

  • [B1] "Analyzing the Robustness of HPC Applications Using a Fine-Grained Soft Error Fault Injection Tool", Qiang Guan, Nathan DeBardeleben, Sean Blanchard, Song Fu, Claude H. Davis IV and William M. Jones, Innovative Research and Applications in Next-Generation High Performance Computing.

Peer-Reviewed Journals

  • [J4] "Using Virtualization to Quantify Power Conservation via Near-Threshold Voltage Reduction for Inherently Resilient Applications", Li Tan, Nathan DeBardeleben, Qiang Guan, Sean Blanchard, and Micheal Lang, Journal of Parallel Computing (PARCO).
  • [J3] "A Failure Detection and Prediction Mechanism for Enhancing Dependability of Data Centers", Qiang Guan, Ziming Zhang and Song Fu, International Journal of Computer Theory and Engineering, Vol. 4(5), October 2012.
  • [J2] "Ensemble of Bayesian Predictors and Decision Trees for Proactive Failure Management in Cloud", Qiang Guan, Ziming Zhang and Song Fu, Journal of Communication, 2012.
  • [J1] "Robust Digital Image Watermarking Algorithm Using RBF Neural Network in DWT domain", Cheng-Ri Piao, Qiang Guan and Seung-soo Han, International Journal of Fuzzy Logic and Intelligent Systems.

Peer-Reviewed Conferences and Workshops

2017

  • [C37] TensorView: Visualizing Training of Convolutional Neural Network Using Paraview, Xinyu Chen, Qiang Guan, Xin Liang, Li-Ta Lo, Simon Su, Trice Estrada, James Ahrens, DIDL, 2017
  • [C36] Lifetime Memory Reliability Data from the Field, Taniya Siddiqua, Vilas Sridharan, Steven E. Raasch, Nathan DeBardeleben, Kurt B. Ferreira, Scott Levy, Elisabeth Baseman, Qiang Guan, Best Paper Final List, IEEE DFT 2017.
  • [C35] RSVP: Soft Error Resilient Power Saving at Near-Threshold Voltage Using Register Vulnerability, Li Tan, Nathan DeBardeleben, Qiang Guan, Sean Blanchard, Michael Lang, DNS-W, 2017.
  • [C34] “Resilience of Top K Selection Algorithms”, Ryan Slechta, Laura Monroe, Nathan DeBardeleben, Qiang Guan, Joanne Wendelberger, Sarah Michalak, EDCC '17.
  • [C33] “LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures”, Bo Fang, Qiang Guan, Nathan DeBardeleben, Karthik Pattabiraman, Matei Ripeanu, HPDC'17. Acceptance Rate:19%.
  • [C32] “Silent Data Corruption Resilient Two-sided Matrix Factorizations”, Panruo Wu, Nathan DeBardeleben, Qiang Guan, Sean Blanchard, Jieyang Chen, Dingwen Tao, Xin Liang, Ouyang Kaiming, Sihuan Li, and Zizhong Chen, PPoPP'17. Acceptance Rate: 21.9% (29/132).

2016

  • [C31] “Probabilistic Computing for HPC in the Post-Moore’s Era”, Laura Monroe, John Daly, Nathan Debardeleben, Sarah Michalak, Qiang Guan and Kevin Rudd, in Post-Moore's Era Supercomputing (PMES) Workshop in conjunction with SC’16.
  • [C30] “Design, Use, and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications”, Qiang Guan, Nathan DeBardeleben, Panruo Wu, Stephan Eidenbenz, Sean Blanchard, Laura Monroe, Elisabeth Baseman, and Li Tan, SIMUTOOLS'16.
  • [C29] Improving DRAM Fault Characterization Through Machine Learning”, Elisabeth Baseman, Nathan DeBardeleben, Kurt Ferreira, Scott Levy, Steven Raasch, Vilas Sridharan, Taniya Siddiqua and Qiang Guan, DSN'16.
  • [C28] “Towards Practical Algorithm Based Fault Tolerance in Dense Linear Algebra”, Panruo Wu, Qiang Guan, Nathan DeBardeleben, Sean Blanchard, Dingwen Tao, Xin Liang, Jieyang Chen, and Zizhong Chen, HPDC'16.
  • [C27] “SDC is in the Eye of the Beholder: A Survey and Preliminary Study”, Bo Fang, Panruo Wu, Qiang Guan, Nathan DeBardeleben, Laura Monroe, Sean Blanchard, Zhizong Chen, Karthik Pattabiraman, Matei Ripeanu, the 3rd IEEE International Workshop on Reliability and Security Data Analysis (RSDA), 2016.
  • [C26] "P-FSEFI: A Parallel Soft Error Fault Injection Framework for Parallel Applications", Qiang Guan, Nathan Debardeleben, Sean Blanchard, Panruo Wu, Laura Monrow and Zizhong Chen, accepted in the 12th Workshop on Silicon Error in Logic-System Effect (SELSE'16), 2016. [View PDF]*
  • [C25] "On the Inherent Resilience of Integer Operators", Laura Monroe, William Jones, Claude Davis, Scott Lavigne, Qiang Guan and Nathan Debardeleben, accepted in the 12th Workshop on Silicon Error in Logic-System Effect (SELSE'16), 2016. [View PDF]*

2015

  • [C24] " Differentiated Failure Remediation with Action Selection for Resilient Computing", Song Huang, Song Fu, Nathan Debardeleben, Qiang Guan and Cheng-Zhong Xu, in The 21st IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'15). [View PDF]*
  • [C23] " Addressing Statistical Significance of Fault Injection: Empirical Studies of the Soft Error Susceptibility", Qiang Guan, Nathan Debardeleben, Sean Blanchard, and Song Fu, in The 21st IEEE Pacific Rim International Symposium on Dependable Computing (PRDC '15). [View PDF]*
  • [C22] " Towards Building Resilience Scientific Applications: Resilience Analysis on the Impact of Soft Error and Transient Error Tolerance with CLAMR Hydrodynamics Mini-App", Qiang Guan, Nathan, DeBardeleben, Brain Atkinson, Robert Robey, and William Jones, in IEEE Cluster'15. [View PDF]*
  • [C21] "Empirical Studies of the Soft Error Susceptibility of Sorting Algorithms", Qiang Guan, Nathan DeBardeleben, Sean Blanchard and Song Fu, in the 5th Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop with HPDC 2015. [View PDF]*
  • [C20] "Empirical Studies of the Soft Error Susceptibility of Sorting Algorithms to Statistical Fault Injection", Qiang Guan, Nathan DeBardeleben, Sean Blanchard and Song Fu, in the 11th Workshop on Silicon Error in Logic-System Effect (SELSE'15), 2015. [View PDF]*

2014

  • [C19]"Fault Injection Experiments with the CLAMR Hydrodynamics Mini-App", Brian Atkinson, Nathan DeBardeleben, Qiang Guan, Robert Robey, and William Jones, in the 25th IEEE International Symposium on Software Reliability Engineering (ISSRE'14), 2014. [View PDF]*
  • [C18]"Towards Exploring the Soft Error Susceptibility of Heapsort Algorithm", Qiang Guan, Nathan DeBardeleben, Sean Blanchard and Song Fu, in the 44th annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'14), 2014. [View PDF]*
  • [C17]"F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability", Qiang Guan, Song Fu, Nathan Blanchard and Sean Blanchard, in the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS'14), May 2014. [View PDF]*

2013

  • [C16]"Exploring Time and Frequency Domains for Accurate and Automated Anomaly Detection in Cloud Computing Systems", Qiang Guan and Song Fu, in 19th IEEE Pacific Rim International Symposium on Dependable Computing(PRDC'13), 2013. [View PDF]*
  • [C15]"Wavelet-Based Multi-scale Anomaly Identification in Cloud Computing Systems", Qiang Guan and Song Fu, in IEEE Global Communications Conference(GlobalCom'13), 2013.[View PDF]*
  • [C14]"Autonomic Failure Identification and Diagnosis for Building Dependable Computing Systems", Qiang Guan, Song Fu, Nathan Blanchard and Sean Blanchard, in Ph.D. Showcase, IEEE/ACM Supercomputing Conference (SC), 2013. [View PDF]*
  • [C13]"Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures", Qiang Guan and Song Fu, in the 32nd IEEE International Symposium on Reliable Distributed Systems (SRDS'13), 2013.[View PDF]*

2012

  • [C12]"AFD: Adaptive Failure Detection System for Cloud Computing Infrastructures", Husanbir S Pannu, Jianguo Liu, Qiang Guan and Song Fu, 31st IEEE International Performance Computing and Communications Conference(IPCCC'12), 2012. [View PDF]*
  • [C11]"An Adaptive Power Management Framework for Autonomic Resource Configuration in Cloud Computing Infrastructures", Ziming Zhang, Qiang Guan and Song Fu, 31st IEEE International Performance Computing and Communications Conference(IPCCC'12), 2012.[View PDF]*
  • [C10]"A Cloud Dependability Analysis Framework for Characterizing System Dependability in Cloud Computing Infrastructures", Qiang Guan, Chi-Chen Chiu and Song Fu, 18th IEEE Pacific Rim International Symposium on Dependable Computing(PRDC'12), 2012. [View PDF]*
  • [C9]"Efficient and Accurate Anomaly Identification Using Reduced Metric Space in Utility Clouds", Qiang Guan, Ziming Zhang and Song Fu, IEEE International Conference on Networking, Architecture, and Storage (NAS'12), June 2012. (Acceptance rate: 30%). [View PDF]*

Before 2011

  • [C8]"Proactive Failure Management by Integrated Unsupervised and Semi-Supervised Learning for Dependable Cloud Systems", Qiang Guan, Ziming Zhang and Song Fu, IEEE International Conference on Availability, Reliability and Security (ARES'11), Aug 2011. (Acceptance rate: 24%). [View PDF]*
  • [C7]"Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience", Nathan DeBardeleben, Sean Blanchard, Qiang Guan, Ziming Zhang and Song Fu, Resilience, Intl. European Conference on Parallel and Distributed Computing (Euro-Par), 2011.(Acceptance rate: 29.9%). [View PDF]*
  • [C6]"Ensemble of Bayesian Predictors for Autonomic Failure Management in Cloud Computing", Qiang Guan, Ziming Zhang and Song Fu, IEEE Intl. Conference on Computer Communications and Networks (ICCCN'11), 2011. (Acceptance rate: 29.6%). [View PDF]*
  • [C5]"auto-AID: A Data Mining Framework for Autonomic Anomaly Identification in Networked Computer Systems", Qiang Guan and Song Fu, IEEE Intl. Performance Computing and Communications Conference (IPCCC'11), 2010.(Acceptance rate: 28%). [View PDF]*
  • [C4]"Anomaly Detection in Large-Scale Coalition Clusters for Dependability Assurance", Qiang Guan Derek Smith and Song Fu, IEEE Intl. Conference on High Performance Computing (HiPC'10), 2010.(Acceptance rate: 19.2%). [View PDF]*
  • [C3]"An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems" (invited paper), Derek Smith Qiang Guan and Song Fu, in Proceedings of CloudApp, the 34th IEEE International Conference on Computer Software and Applications (COMPSAC'10), July 2010.(Acceptance rate: 20%).[View PDF]*
  • [C2]"Reliability and Dependability Analysis for Agent-Based Reliability Enhancement Technology (ARET)System", Qiang Guan and Seung-soo Han, International Conference on Electronic Computer Technology(ICECT), 2009. [View PDF]*
  • [C1]"Research on Optimization of Process Bus in IEC 61850 Based Substation Communication Network", Yang Liu, Qiang Guan, Seung-Soo Han, Myeon-Song Choi, Seung-Jae lee, The International Conference on Electrical Engineering (ICEE), 2009. [View PDF]*

Posters

  • [P8]"F-SEFI: A Fine-grained Soft Error Fault Injector for Profiling Application Vulnerability", in Los Alamos National Lab Predictive Science Panel (PSP) Review, Mar 2014, Los Alamos.
  • [P7]"Wavelet-Based Multi-scale Anomaly Identification in Cloud Computing Systems", in IEEE Global Communications Conference (GLOBECOM), Dec 2013.
  • [P6]"F-SEFI: A Fine-grained Soft Error Fault Injector", in 5th Los Alamos National Laboratory Annual Mini Showcase, poster session, Jul 2013.
  • [P5]"SEFI: A Soft Error Fault Injector for application vulnerability Analysis", in 13th Los Alamos National Laboratory Annual Student Symposium “Championing Scientific Careers”, Jul 2013.
  • [P4]"Building Dependable and Energy-Efficient Cloud Computing Systems", DTX Technology, Business and Finance Forum, Dallas, Mar 2013.
  • [P3]"Performance Metric Selection for Efficient and Accurate Anomaly Detection on Cloud Computing Systems", in Research Lab Poster Demo, University of North Texas, Dec 2011.
  • [P2]"A Data Mining Framework for Autonomic Anomaly Identification in Networked Computer Systems", in annual Industrial Advisory Board Meeting of the Net-Centric IUCRC, Dec 2010.
  • [P1]"Security Based Dependability Analysis for IED System in SAS", the 2nd International Conference on Advanced Power System Automation and Protection (APAP), 2007.

Domestic Conference Papers (South Korea):

  • [D3] "IEC61850 Based FRTU Development Scheme", Qiang Guan, Myeon-Song Choi and Seung-Soo Han, in proceeding of the 38th KIEE summer annal conference, 2007.
  • [D2] "The Application of IEC 61850 in a New Anti-fault Substation System", Qiang Guan and Seung-Soo Han, in proceeding of the 37th KIEE summer annal conference, pp242-244, 2006.
  • [D1] "Analysis on Security and Dependability for IED System in Substation Automation System", Qiang Guan, Seung-Soo Han and Seung-Jae Lee, in proceeding of 2006 KIEE Supplement Conference, pp21-23, 2006.

Talks:

  • [T8] "FSEFI: A Fine-grained Soft Error Fault Injector for Profiling Application Vulnerability", in 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2014.
  • [T7] "Autonomic Failure Identification and Diagnosis for Building Dependable Computing Systems", in Ph.D. showcase on IEEE/ACM Supercomputing Conference (SC), 2013.
  • [T6] "SEFI: A Soft Error Fault Injector for application vulnerability Analysis", in 5th Los Alamos National Laboratory Annual Mini Showcase, technical talk session, Jul 2013.
  • [T5] "AFD: Adaptive Failure Detection System for Cloud Computing Infrastructures", in 31st IEEE International Performance Computing and Communications Conference (IPCCC), Dec 2012.
  • [T4] "A Cloud Dependability Analysis Framework for Characterizing System Dependability in Cloud Computing Infrastructures", in IEEE Pacific Rim International Symposium on Dependable Computing (PRDC), Jul, 2012.
  • [T3] "Efficient and Accurate Anomaly Identification Using Reduced Metric Space in Utility Clouds", in 7th IEEE International Conference on Networking, Architecture, and Storage (NAS), Jun 2012.
  • [T2] "Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience", in 3rd Los Alamos National Laboratory Annual Mini Showcase, Ultra-scale System Research Center, Los Alamos National Lab, Aug 2011.
  • [T1] "auto-AID: A Data Mining Framework for Autonomic Anomaly Identification in Networked Computer Systems", in 29th IEEE Intl. Performance Computing and Communications Conference (IPCCC), Dec 2010.

* Personal use of the material provided here is permitted. Permission from respective authorities must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.