For employees in the industry, they
need to have C programming experience > 2 years OR knew about Undefined Behavior before.
For graduate students, in addition to the above requirements for employees, they also need to
have a background in software security or programming language
participate in at least one C project as the main developer (contribution> 500 LoC)
You are being invited to participate in a survey on research on compiler-introduced security bugs. This research has obtained a local ethics review waiver. It should take approximately no more than 15 minutes to complete.
Participation
Your participation in this survey is voluntary. You may refuse to take part in this research or exit the survey at any time without penalty.
Purpose of the research
The purpose of our questionnaire is to investigate C programmers' knowledge and views of certain security issues introduced (or caused) by compilers, as well as their expectations for prospective research work.
Procedure of the research
The process of this questionnaire is to
(1) introduce the concept of “Undefined Behavior” in the C programming language to you through reading materials and ask you about your knowledge of that;
(2) introduce compiler-introduced security bugs related to (or caused by) “Undefined Behavior” to you through reading materials and ask you about your knowledge of that;
(3) ask about your estimate of the actual experience of programmers encountering compiler-introduced security bugs related to “Undefined Behavior” and your opinions of these bugs;
(4) introduce another kind of compiler-introduced security bugs to you through reading materials and ask you about your knowledge and opinions of that;
(5) asks about your expectations for prospective research work in this area.
The expected duration of the subject's participation: generally less than 15 minutes.
Benefits
You are likely to increase your understanding of “Undefined Behavior”, how compilers interpret it, and related security issues.
Risks
There are no foreseeable risks involved in participating in this study other than those encountered in day-to-day life.
OR
(1) You may feel uncomfortable/offended by some inadvertent statements in the survey. (2) You may feel uncomfortable/nervous/embarrassed/tired.. answering some of the survey questions.
Confidentiality Statement
You are a respondent who meets our recruitment requirements, and your participation is very important to our research. For the questions that ask your opinion in the questionnaire, please answer these questions according to your actual thoughts, there is no right or wrong opinion.
We promise to make every effort to keep all the information you provide strictly confidential and only use it for aggregated statistical analysis for research purposes. Only core experimenters have access to your data. No information which could identify you will be shared in publications about this study. We will not store your name or other personal information with the data.
Consent
Please confirm that you have read and understood the "Informed Consent Form" and agree to the following questionnaire survey. If you have any questions, please feel free to contact our experimenters.
⚪ I have read and understood the "Informed Consent Form" and agreed to conduct the following questionnaire survey.
Note: Please record the time it takes you to complete this questionnaire.
(1)What’s your job?
⚪ A master student
⚪ A PhD student
⚪ An academic researcher
⚪ A security analyst / security maintainer
⚪ A compiler developer or maintainer
⚪ A professional C programmer
⚪ Others(Don't need to be very specific)___
(2)How much experience do you have in C programming?
⚪ _____years
The following is to investigate your understanding and views on undefined behavior and related compiler introduced security bugs.
(1)What is Undefined Behavior (UB)?
The behavior of a program whose execution result is not specified according to the language semantics. The compiler can make the assumption that such behavior does not exist.
For example:
Division by zero. The language specification does not specify what the execution result of "divide by 0" changes in the program, and allows the compiler to assume that there is no such "divide by 0" operation in the source program when compiling. If the compiler infers that a divisor must be 0, the compiler could, e.g., simply remove the offending code.
Signed integer overflow. When an arithmetic operation causes a signed integer to exceed the representable range of its type (greater than the maximum value or less than the minimum value), the language specification does not specify what the result of the arithmetic operation is and allows the compiler to assume that such operations will not occur, potentially removing offending code.
(2)Why does the language specification introduce the concept of "undefined behavior"
To reduce compiler complexity
To provide maximal compatibility across implementations and architectures
To reduce the runtime overhead of the compiled program
Before doing our questionnaire, did you know "undefined behavior"?
⚪ Yes, I did
⚪ No, I didn’t
Before doing our questionnaire, did you know the “UB does not exist” assumption of the compiler?
⚪ Yes, I did
⚪ No, I didn’t
CISB (Compiler-Introduced Security Bugs):
We define a software bug as a CISB when
There are no security issues in its source code
During the compilation process, the modification of the compiled code introduced by the compiler directly leads to security issues
Note: The term "introduced" in UB-CISB is only used to describe that the appearance of direct code changes that cause security issues is brought about by the compiler during its analysis and optimization passes, and does not represent any subjective evaluation of such issues. You are welcome to have personal opinions on this issue.
UB-CISB (Compiler-Introduced Security Bugs related to Undefined Behavior)
We define a software bug as a UB-CISB when
There is UB in the source code
The triggering of UB in the source code will not cause any security issues
When the compiler makes certain optimizations based on the assumption that "UB does not exist", a security problem appears in the program
Code examples for a UB bug and a UB-CISB.
A UB bug:
The triggering of the UB is a bug.
In this example, once the integer overflow in line 3 is triggered (e.g., by providing MAX_INT - 1 as the length of the buffer, the addition of LEN_SUFFIX will exceed the maximum integer and “roll over”), the memory space allocated by malloc() will be less than the value that should be allocated, which may cause serious memory errors afterwards.
An UB-CISB:
The triggering of UB is in line with the programmer's expectations, but the compiler's behavior will make the code insecure.
In this example, the programmer adds a security check to line 4 on the basis of the code in the previous example to catch "integer overflow" in advance. Then the security risk should be eliminated. However, because the compiler assumes that there is no UB, it will determine that the result of the security check in line 4 must be False, thereby removing this security check, and causing the patched code to reoccur security issues.
Before doing our questionnaire, did you know UB-CISB?
⚪ Yes, I did
⚪ No, I didn't
Do you know the existing mitigation measures that the compiler has for some UB-CISB? (such as configuration options -fno-delete-null-pointer-checks, -fno-strict-overflow)
⚪ No, I didn’t know
⚪ I heard about that but don’t know the details
⚪ Yes, I know what they are and how to use them
Note: Since there is no language specification requirement, these mitigation measures of the compiler currently only cover the most common part of UB, and they are not guaranteed to always have the desired effect.
The following is to ask about your estimate of the actual experience of programmers encountering UB-CISB and your opinions about UB-CISB
UB that may involve UB-CISB
STACK[1] shows 10 common UBs that are related to real-world UB-CISBs.
In theory, all UBs that do not directly trigger security issues may involve UB-CISBs. There are about 180 such UBs. (There are 203 UBs listed in the C language specification (C17). According to a research project [2], there are about 10 UBs that trigger security issues directly.)
[1] Wang, Xi, et al. "Towards optimization-safe systems: Analyzing the impact of undefined behavior." Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 2013.
[2] https://wiki.sei.cmu.edu/confluence/display/c/CC.+Undefined+Behavior
What the UB rules look like
Below we show the first few UB rules listed in the C language specification (for reference only, no need to read completely and carefully)
A "shall" or "shall not" requirement that appears outside of a constraint is violated (clause 4).
A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment (5.1.1.2).
Token concatenation produces a character sequence matching the syntax of a universal character name (5.1.1.2).
A program in a hosted environment does not define a function named main using one of the specified forms (5.1.2.2.1).
The execution of a program contains a data race (5.1.2.4).
How difficult do you think it is for normal C programmers to learn and understand UB-CISB?
⚪ Extremely difficult
⚪ Very difficult
⚪ Moderately difficult
⚪ Slightly difficult
⚪ Not difficult at all
When UB-CISB appears in the code they maintain, how difficult do you think it is for normal C programmers to debug the root cause by themselves?
⚪ Extremely difficult
⚪ Very difficult
⚪ Moderately difficult
⚪ Slightly difficult
⚪ Not difficult at all
Have you ever encountered UB-CISB? If you have, please select the longest time it takes you to solve the related problem.
⚪ No, I have not
⚪ < 30 minutes
⚪ 30 minutes to 2 hours
⚪ 2 hours to 24 hours
⚪ > 24 hours
⚪ Not solved
When telling them the compiler’s assumption of “UB does not exist” and allowing them to refer to the complete list of UB rules in the language specification, how difficult do you think it is for normal C programmers to avoid writing UB-CISB?
⚪ Extremely difficult
⚪ Very difficult
⚪ Moderately difficult
⚪ Slightly difficult
⚪ Not difficult at all
What is your opinion on who should bear the responsibility of UB-CISB? (The number indicates the degree, 1 means you think this is completely the responsibility of the programmer (i.e., they introduced the bug), 5 means neutral, 9 means you think this is completely the responsibility of the compiler. Please fill in an integer from 1 to 9 that can represent your point of view)
_____
The following asks about your knowledge and views on other types of CISB.
When a certain security property in the code exceeds the scope of the language specification (such as the lifetime or the memory boundary of sensitive data), compiler optimizations may also destroy this security property, thereby introducing security bugs.
Here are two typical examples:
The compiler eliminates sensitive data scrubbing
In order to prevent information leakage, some programmers use memset() to set sensitive variables to zero at the end of using it, thereby scrubbing sensitive data on memory. However, the compiler may infer that such a memset() is redundant, because these variables will not be used later, so the “Dead Store Elimination” optimization will remove such sensitive data scrubbing. Then the sensitive data will continue to remain in memory. This allows the attacker to take advantage of it by, e.g., reading it through a buffer overread bug.
The compiler introduces a timing side channel.
In order to prevent attackers from inferring sensitive data through the difference in program execution time (ie, a timing side channel attack), some programmers will add redundant code to balance the execution time of different execution paths. But the compiler is likely to remove these redundant instructions, so that the attacker can still leverage the timing side channel to leak sensitive information.
Before doing our questionnaire, did you know such CISB whose security property in its source code exceeds the scope of the language specification?
⚪ Yes, I did
⚪ No, I didn’t
For such CISB whose security property in its source code exceeds the scope of the language specification, what is your opinion on who should bear the responsibility of such CISB? (The number indicates the degree, 1 means you think this is completely the responsibility of the programmer (i.e., they introduced the bug), 5 means neutral, 9 means you think this is completely the responsibility of the compiler. Please fill in an integer from 1 to 9 that can represent your point of view)
_____
The following asks about your expectations for CISB-related research work.
How necessary do you think it is to systematically collect all kinds of CISBs in the real world? (all possible UB-CISBs; CISBs related to compilers’ default assumptions other than "UB does not exist"; more CISBs whose security property in its source code exceeds the scope of the language specification)
⚪ Extremely necessary
⚪ Very necessary
⚪ Moderately necessary
⚪ Slightly necessary
⚪ Not necessary at all
How necessary do you think it is to measure the distribution and influence of CISB in the real world?
⚪ Extremely necessary
⚪ Very necessary
⚪ Moderately necessary
⚪ Slightly necessary
⚪ Not necessary at all
(End of this questionnaire.)