Testing of Pre-trained Large Language Models for Zero-shot Software Engineering Tasks (Ongoing)
[PI: Prof. Shouvick Mondal, ANRF - Prime Minister Early Career Research Grant]In this research project, we (i) reveal adversarial LLM-prompts that may potentially produce misaligned responses to critical SE tasks where issues can be fatal if deployed in safety-critical systems, (ii) focus on functional/non-functional aspects of software testing processes by/for LLMs, and (iii) investigate whether zero-shot LLM prompt engineering techniques would be able to increase the number of Software Development Life Cycle phases under the umbrella of autonomous Software Engineering (SE).Employing Generative AI in Software Development (Upcoming)
[PI: Prof. Shouvick Mondal, Google - Gemini Academic Program]Traditional software engineering struggles to keep pace with the rapid evolution and complexity of modern applications, leading to challenges in code quality, documentation, and collaboration. This project aims to address these issues by harnessing the power of Generative AI like Large Language Models (LLMs) to reshape software development and testing practices. Integrating LLMs into software development promises to expedite code generation, automate documentation, enhance bug detection, and streamline code reviews. The outcomes include accelerated development cycles, elevated code quality, and improved collaboration within teams, potentially redefining industry standards.Incorporating Parallelization and Consensus Based Approaches in Software Testing Workflows (Ongoing)
[PI: Prof. Shouvick Mondal, IIT Gandhinagar]The software testing life cycle is composed of multiple modules, some of which are inherently sequential while some are parallelizable. The parallelizable modules follow certain structures which facilitate several operations to be performed in parallel. The complexity of transformation needed to extract parallelism from these modules not only determines the efficiency of parallelization, but also the correctness and the benefit, i.e. speedup obtained after parallelization. A parallelization should produce an output deemed to be correct for an application. Meeting both efficiency, and correctness is challenging. The first part of the project aims to automatically identify and devise parallelization opportunities in the software testing life-cycle that balance efficiency, and correctness. The second part of the project deals with employing techniques from social choice theoretic frameworks such as consensus into testing phases when there is no single best performing strategy. Such scenarios are resolved by employing individual techniques known to perform well under specific conditions and perform an aggregated benefit.(ASE 2023, FSE 2024, arXiv 2024, JSS 2025, India HCI 2025)Studying the Impact of Prompt Mutations on Code Generating LLMs (Completed)
[co-PI: Prof. Shouvick Mondal, OpenAI API Researcher Access Program]In this research project, we investigate whether LLMs can withstand mutations in code generations tasks and still generate valid codes, and perform a comparative study of multiple LLMs.(India HCI 2025)Testing of the OpenAI DALL-E Detection Classifier System (Completed)
[co-PI: Prof. Shouvick Mondal, OpenAI DALL-E Detection Classifier Access]In this research project, we investigate potential biases and the classifier's resilience against manipulated or adversarial images.Differential Analysis of Original and Reverse-engineered C source codes (Completed)
[PI: Prof. Shouvick Mondal, GCP Research Credits]This research targets the challenge of testing functional equivalence between C source codes and their compiled binaries. Conventionally, a single compiler is used, but resultant executables' functional parity remains uncertain. This project aims to identify compilation defects by comparatively analyzing executables from different compilers for the same source code. In cases without source code, the project elevates binaries to an intermediate representation through reverse engineering, extracting source code for decompilation defect detection. The project also identifies code similarity across compiler optimization levels, applicable in bug detection for compilers and decompilers.(FSE 2024 x2)Testing and Debugging of Ultra-Large-Scale Systems (Completed)
[PI: Prof. Tse-Hsun (Peter) Chen, Concordia University, Montreal, Canada]Large-scale systems, such as Amazon.com and Google, have become an integrated part of our daily life. However, the increasing complexity and scale of large-scale software systems pose many challenges in software reliability and quality assurance. To ensure software quality, developers follow two general practices to improve software quality: testing and debugging. However, there are limitations with the current practices. Testing is a preventative approach to ensure software quality, where developers try to exercise the system using pre-defined inputs and verify if the outputs are expected. One major limitation is that test code itself may contain quality issues that result in unexpected outcome. When failures happen in production, developers need to debug the root cause and propose a solution. Unfortunately, production failures are difficult to reproduce due to lack of runtime information (e.g., users’ input, system environment, or configuration). Developers can only rely on analyzing system logs to debug and diagnose the root causes. Even though developers have been leveraging logs for decades, there exists no industrial standard on how to best utilize logs for failure diagnosis. The goal of the research is to conduct studies and propose techniques to help practitioners improve testing and debugging practices.(EMSE 2024, TOSEM 2024)Soundy Automated Parallelization of Test Execution (Completed)
[PI: Prof. Marcelo D'Amorim, Federal University of Pernambuco, Recife, Brazil]Software regression testing is an important quality assurance practice widely adopted today. Optimizing regression testing is important. Test parallelization has the potential to leverage the power of multi-core architectures to accelerate regression testing. Unfortunately, it is not possible to directly use parallelization options available in build systems and testing frameworks without introducing test flakiness. Tests can fail because of data races or broken test dependencies. Although it is possible to safely circumvent those problems with the assistance of an automated tool to collect test dependencies (e.g., PRADET), the cost of that solution is prohibitive, defeating the purpose of test parallelization. This research proposes PASTE an approach to automatically parallelize the execution of test suites. PASTE alternates parallel and sequential execution of test cases and test classes to circumvent provoked test failures. PASTE does not provide the safety guarantee that flakiness will not be manifested, but our results indicate that the strategy is sufficient to avoid them. We evaluated PASTE on 25 projects mined from GitHub using an objective selection criteria. Results show that (i) PASTE could circumvent flakiness introduced with parallelization in all projects that manifested them and (ii) 52% of the projects benefited from test-parallelization with a median speedup of 1.59x (best: 2.28x, average: 1.47x, worst: 0.93x).(ICSME 2021)Software Regression Testing Powered by Parallelization Windows (Completed)
[PI: Prof. Rupesh Nasre, IIT Madras]The research work focuses on exploring the parallelization of different phases in the regression testing cycle. Our approach begins by end-to-end parallelization of the three phases: (i) offline, (ii) online, (iii) and execution, using uniformly-sized parallelization windows. Issues with sequential bottlenecks, and unevenly-parallel workloads turn our attention to the online and execution phases, respectively. A sub-process of the online phase happens to be test-case prioritization. Our approach emphasizes on heuristically improving the effectiveness of the prioritization exercise. To this end, test-case prioritization is performed by: (i) code-change relevance-and-confinedness (relcon), (ii) hybridization of relcon, and social choice theoretic consensus of multiple prioritized permutations. We show that not only do different heuristics impact the effectiveness of prioritization followed by conventional sequential test-execution, but it also leads to a different test-load distribution due to parallel execution with uniformly-sized windows. Prioritized test-execution is further explored using windows of non-uniform parallelization factor, primarily determined by ties in the priority distribution. We explore different configurations in parallelization windows and show that different prioritization-parallelization combinations lead to different combinations of effectiveness-speedup benefits.(JSS 2019, JSS 2021, TSE 2021, IN Patent 386511)