Title: Evaluation of Using Large Language Models for Software Security: Learning from Analyzing a Few Cases
Abstract: The rapid advancement of Large Language Models (LLMs, e.g., Claude or ChatGPT) have stimulated the trend of leveraging LLMs based tools for supporting software security. One of the key motivators of experimenting with a bleeding-edge technology like LLMs in relatively conservative domains like security is the desire to augment human capabilities to develop and evolve secure software. Whilst it is important to advance the SOTA of LLMs based methods/tools (e.g., new/advanced algortims) for improved software security, it is equally important to systematically understand the technological, organisational and socio-psychology, i.e., socio-technical, aspects of leveraging LLMs for security by design paradigm. For this purposes, there is important need of empirically analyzing and reflecting upon the reported studies that aim at empirically evaluating the use of LLMs for software security. To this end, we have been following two pronged strategy: conducting empirical studies to evaluate the application of LLMs for software security and rigorously selecting and analyzing the evaluation aspects of the studies reporting the use of LLMs for software security. We purport to assess the current evaluation R&D in this field to identify research problems and devise/evaluate appropriate methodological strategies for helping practitioners to decide which and when to integrate LLMs tools in their software development lifecycle and to stimulate resaerchers to carry out new R&D for building an evidence-based body of knowledge for supporting reliable and trustworthy use of LLMs for software security. In this talk, I’ll share the key motivates, outcomes and lessons from analyzing and reflecting upon our own R&D and a few cases from the literature reporting the use of LLMs for software security. I’ll also touch upon the methodologically aspects of the analyzed studies, before elaborating on the kinds of challenges observed in conducting evaluating studies of using LLMs for software security.
M. Ali Babar is a Professor in the School of Computer Science, University of Adelaide, Australia. He leads a theme on architecture and platform for security as service in Cyber Security Cooperative Research Centre (CSCRC), a large initiative funded by the Australian government, industry, and research institutes. Professor Babar leads one of the largest projects on “Software Security” in the ANZEC region funded by the CSCRC. Software Security with Focus on Critical Infrastructure, SOCRATES, brings more than 75 researchers and practitioners from 10 organization for developing and evaluating novel knowledge and AI-based platforms, methods, and tools for software security. Prof Babar established an interdisciplinary research centre called CREST, Centre for Research on Engineering Software Technologies, where he leads the research, development and education activities of more than 30 researchers and engineers in the areas of Software Systems Engineering, Security and Privacy, and Social Computing. Professor Babar has authored/co-authored more than 300 peer-reviewed research papers at premier Software journals and conferences. Professor Babar obtained a Ph.D. in Computer Science and Engineering from the school of computer science and engineering of University of New South Wales, Australia. He also holds a M.Sc. degree in Computing Sciences from University of Technology, Sydney, Australia. More information on Professor Babar can be found at http://alibabar.net