A Summarised overview on Parallel Programming Languages
Pin (A dynamic Instrumentation tool from Intel)
https://software.intel.com/en-us/articles/pintool/
Load Balancing and Bottleneck Detection
1. Kristof Du Bois, Jennifer B. Sartor, Stijn Eyerman and Lieven Eeckhout . Bottle Graphs: Visualizing Scalability Bottlenecks in Multi-Threaded Applications. OOPSLA ’13, October 29–31, 2013, Indianapolis, Indiana, USA.
2. John Demme and Simha Sethumadhavan. Rapid Identification of Architectural Bottlenecks via Precise Event Counting. ISCA’11, June 4–8, 2011, San Jose, California, USA.
Reusability analysis
1. Miquel Pericàs, Kenjiro Taura, and Satoshi Matsuoka. Scalable Analysis of Multicore Data Reuse and Sharing. June 2014 ICS '14: Proceedings of the 28th ACM international conference on Supercomputing
2. Derek L. Schuff, Milind Kulkarni, and Vijay S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. September 2010 PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional Memory
1. M. A. Gonzalez-Mesa, Eladio Gutierrez, Emilio L. Zapata and Oscar Plata. Effective Transactional Memory Execution Management for Improved Concurrency. ACM Transactions on Architecture and Code Optimization, Vol. 11, No. 3, Article 24, Publication date: July 2014.
2. Lihang Zhao and Jeffrey Draper. Consolidated Conflict Detection for Hardware Transactional Memory. PACT’14, August 24–27, 2014, Edmonton, AB, Canada.
Parallel design patterns and algorithmic skeletons
1. Kurt Keutzer and Tim Mattson. A Design Pattern Language for Engineering (Parallel) Software
2. Kyle Burke. Chapel: a versatile language for teaching parallel programming: conference workshop. Journal of Computing Sciences in Colleges , Volume 30 Issue , June 2015.
Chunk based record replay of system
QuickRec: Prototyping an Intel Architecture Extension for Record and Replay of Multithreaded Program. ISCA’13 Tel Aviv, Israel
Work Stealing and Safety nets
Adam Morrison and Yehuda Afek. Fence-Free Work Stealing on Bounded TSO Processors. ASPLOS ’14, March 1–5, 2014, Salt Lake City, Utah, USA.
Kernel Instrumentation using Dynamic Binary Translation(DBT)
Prashanth P. Bungaleand Chi-Keung Luk: PinOS: A Programmable Framework for Whole-System Dynamic Instrumentation. VEE’07, June 13–15, 2007, San Diego, California, USA.