The method is composed of three activities:
• CLUSTERING SIMILAR REQUIREMENTS
• CAPTURING VARIABLE PARTS
• GENERATING CORE REQUIREMENTS
The inputs for our method are functional requirements documents specifying behaviors of different software products.
In the first step, the products’ requirements are clustered using a hierarchical agglomerative clustering algorithm and different semantic similarity measures. Each of the emerging clusters is associated with a core requirement.
In the second step, the clusters are analyzed for capturing variable parts. This is utilized by the Semantic Role Labeling (SRL) technique, which is an NLP technique that labels constituents of a phrase with their semantic roles in the phrase. Assuming functional requirements as inputs, we refer to the following four semantic roles that are common for describing behaviors: (1) Agent – Who performs? (2) Action – What is performed? (3) Object – On what object is it performed? and (4) Instrument – How is it performed? A functional requirement may include several (ordered) quadruplets of the form (actor, action, object, instrument). We call them behavioral vectors. Some of the constituents of a behavioral vector may be missing, e.g., the actor in passive sentences or the object/instrument for “obvious” actions.
After capturing the variable (and the common) parts, the method generates core requirements (one for each cluster) using the variability framework. With respect to the product dimension:
• Similar behavioral vectors which appear in all analyzed products are defined as mandatory parts of the core requirements.
• Similar behavioral vectors which appear in a significant number of products (e.g., half of them) are considered optional parts of the core requirements.
• All other similar behavioral vectors are treated as product-specific extensions, even if appearing in more than one analyzed product.
With respect to the element dimension, the semantic parts (roles) of behavioral vectors which have identical values are classified as common, identifying the anchors of the core requirements. All other semantic parts are considered variants. The variants may be the result of syntactic, semantic, or domain-specific differences among the input product requirements.