(link to paper in title)
CONFERENCE: ICSE 2016: International Conference on Software Engineering.
Guides: Dr Sumit Gulwani (MSR), Dr Mark Marron (IMDEA), Dr Subhajit Roy(IITK), Dr Amey Karkare(IITK)
Students: Aditya Desai, Nidhi Jain, Vineet Hingorani, Sailesh Ramayanan.
DESCRIPTION:
There are lot of tasks that an end user of computer systems wants to do but is limited by his / her skills in basic programming. For example, a lot of users have to specifically learn the macro-language use in spread sheets. With an end goal of helping such users perform tasks effortlessly in natural language, we build a generic system to convert any natural language instruction into underlying domain grammar programs which can be interpreted easily. During the course of project, we tested our system on variety of domains like Text Editing, Air Travel Information System (ATIS) and Automata design. The system worked well with about 90% of the test cases giving correct program with rank 1. The project had multiple aspects to it.
1. User Studies : For the lack of data for Text Editing domains, we had to setup user studies to gather data. We make this data along with other data sets we used here . Data Sets for Studies
2. Grammar Design : We devised domain grammars for the Text Editing, Automata and ATIS. We also give an insight into how grammar design should be for best results considering the effectiveness of the entire system. More details can be found in the thesis of Nidhi Jain.
3. Dictionary Design: For each domain we had to define word to terminal mappings which will be used to generate programs in underlying grammar. For more details on how we achieved this in an automated manner please refer to a chapter from thesis of Vineet Hingorani
3. Algorithm for generating Programs and ML Data: We implemented the algorithm for generating all the programs from the natural language sentence in C#. This system was a central implementation and was used further to generate data required for ML problems.
4. Ranking: We posited the ranking score as a linear combination of three scores capturing different aspects of the translation. The scores were
Used words: this captured the information used from the sentence
Mapping score: likelihood of the mapping used being correct.
Connection score: likelihood of the grammar rules used to construct the program being correct. For the connection score we discovered 7 intuitive features for.
For each learning we used off-the-shelf Naïve Bayes classifer. Also, the three scores were combined using gradient descent to optimize a tailor-made objective function maximising the number of times the correct program was ranked at the top.
For more information of implementation and ranking problem and sub problems please refer to my thesis report
4. Optimizing the coefficients of the scores : The final score used for ranking is a linear combination of the individual scores defined above. we used gradient descent on the tailor made optimisation function to optimize the number of times a program is ranked at top to optimize for these coefficients. The nature of objective being discrete , we had to encode the objective in a slightly intelligent manner to keep it continuous and differentiable. For detailed information, please refer to thesis of Sailesh Ramanayanan or paper
(link to paper in title)
CONFERENCE: “APSEC 2013: Asia-Pacific Software Engineering Conference ”
GUIDES: Dr Subhajit Roy
Students: Aditya Desai, Era Jain
DESCRIPTION:
We devised a control refinement algorithm for transforming certain classes of loops into semantically equivalent code which is relatively easy to verify by the state-of-the-art verification technology.
Class of difficult loops: The loops having disjunctive invariant are difficult to verify and often prohibit verification of huge programs. However, it is possible for some sub classes of such loops to be transformed into semantically equivalent code which requires only conjunctive invariant.
Subclass of difficult loops catered : We devised an algorithm for finding loops with static iteration patterns in the path taken within the loop. Such loops can be unrolled period-number of times and resulting loop will not have disjunctive invariant.
Algorithm : We coded up description of the loop in a symbolic manner based on rules of Hoare’s logic. We then added a condition that behavior of conditional statements in the loop are periodic. This entire condition is solved by using off-the-shelf SMT solver Z3.
Parametric Verification: we introduced the idea of parametric verification which can be used to extend class of loops described above to cases when the pattern depends on certain parameters. In such cases we can verify under transformation for array of parameter values and hence verifying the program in probability.
Implementation: We implemented the algorithm : identifying and unrolling in LLVM.