Tools and Software Development

I have very often got more inspiration when I developed system prototypes of research models that we try to study. In my opinion, implementation is the key to get a better grasp of the problem you are studying and helps you to envisage immediate application of the research problem that catalyses a greater zeal to crack it. Though one might argue that its better to focus on either theory or practice, I believe that one compliments the other, and hence whenever you get bored with theory, its a good idea to start implementing it, and vice versa. I also find a similar analogy between front-end development and back-end development, if you are back-end developer, you might have experienced that having a nice front-end can always motivate your back-end development, and vice.versa. Details of projects that I was involved in the recent years and my contributions in them are listed below.

    1. XData: A generic SQL turtoring platform. Some of the reasons why SQL is so ubiquitous as a database query language is its effectiveness, expressiveness, and its formal properties that allow efficient implementations (under bag semantics and closed world assumption). However its terse syntax and complex semantics is difficult to crunch for a novice (under) grad student. XData is a tool that becomes handy for automated tutoring and grading of SQL. Though its usage was initially intended for SQL course instructors in academia, due to its automated grading features and expressivity, it is very well suitable as a tool for automatically verifying/testing programmer written queries in the industrial setting. One of its usage is query grading for automated tutoring of SQL queries, where an instructor publicized a schema, a set of integrity constraints (eg: foreign key and primary key), and sets questions by specifying them in english language. He also can set SQL queries as answers for these questions. Any student written query is evaluated against the instructor query by an equivalence checking (a problem that is considered difficult) algorithm. A key step is to generate critical datasets that guarantees to differentiate any non-equivalent query (called data generation phase). Another key step is to give score a student written query, inspite of its non-equivalence against the instructor query, depending on how close the student written queries is to the instructor query (called partial marking phase) by syntax. Some of my contributions were to extend and implement the partial marking module. Input SQL student and instructor query are represented as tree shaped structures (developed an interface to display SQL query trees, see figure below), they are canonicalized by converting to standard forms (for instance, where clause is transformed to disjunctive normal form of nonnegative atomic relational conditions. eg: B >= A AND A < 2 are transformed to A<=B AND A<=1), so that a judicious syntactic comparison can be made by comparing the respective components of the instructor and student query. Other tasks involved implementing query minimization by removing redundant tables, (outer) joins, non recursive with clause elimination, extending the JSQL query parser for the support of operations that it does not support.

Partial Marking Interface
    1. Performance of partial marking was compared to manual scoring given by teaching assistants (TAs), and we found that the freedom to write SQL queries in multifarious syntactic ways distorts badly such a pure syntactic comparison. Hence, partial marking was only used for cases where student queries were verified to incorrect. For instance, many cases of students using subqueries, when an instructor query contained only select-project-join features were discovered. Though in our experimental analysis (interface for graphical analysis of student results against instructor query were developed, see figure below), partial marking performed much worse that manual scoring, we still found good correlation (coefficiant of correlation ~0.6) between the scores awarded by TAs and partial marking scores for our real dataset that had hundreds of student queries.

partialMarking analytics interface
    1. Guides: Prof. S. Sudarshan

    2. Programming Language: Java, Other Tools: JSQL query parser, PostgreSQL open source DB, Apache Struts framework, DHtml Javascript library (for tree and chart display)

    3. Search Bomb: A Keyword querying engine for RDF knowledge. A web search suite that includes an RDF crawler, indexes RDF files

    4. onto an Apache Solr server, which is then used for keyword search. An RDF File is scanned for entitities (pottentially any RDF:Resource), and for each entity, fields such as RDF:Description, RDF:Comment, RDF:Label, the information of whether it is a class, property, or an individual, its unique id (URI), its provenance, which is the name of the source file in which appears are used to form a single Solr document that is added to the Solr Server. An interface for uploading RDF files to the Solr server was also implemented (see Fig. below)

Index file interface
    1. Indexed document server can be keyword searched (see snapshot of interface below) leveraging the advanced search features of Solr that includes stemming using porter's algorithm, removal of delimiters, stopwords, duplicate removals, case conversion, whitespace removal. Keyword expansion was enabled using expanding the search keywords using synonymns, hypernyms, and hyponyms from the wordnet dictionary.

Search Index Interface
    1. Guides: Prof. S. Sudarshan

    2. Programming Language: Java, Other Tools: Apache Solr indexing engine, JAWS Wordnet API, Apache Struts framework, OWL API RDF library, Apache Jena RDF library

    3. Contextualized Quad-Systems: Developing chase based algorithms for detecting membership and deductive closure computation of the various contextualized quad-system classes (cAcyclic, safe, csafe, and range restricted), with large data sets for the evaluation part of my PhD thesis. The average dChase computation time, and membership detection time, query response time of the various quad-system classes were tabulated and graphically visualized. The contexts were implemented as OWLIM repositories.

    4. Guides: Dr. Luciano Serafini, Prof..Gabriel Kuper, Prof.Till Mossakowski

    5. Programming Language: JAVA, Other Tools: Sesame rdf4J RDF library, OWLIM GraphDB OWL engine

    6. Contextualized Knowledge Repository is a knowledge representation and reasoning framework with an accompanying prototype that build on Semantic Web technologies to represent, store, query and reason with contextualized knowledge, i.e. knowledge that holds under specific circumstances or contexts. The CKR addresses an arising needs in the Semantic Web, where as large amounts of Linked Data are published on the Web, it is becoming apparent that the validity of published knowledge is not absolute, but often depends on time, location, topic, and other contextual attributes. See figure below snapshot of the multi-context querying interface which was one of my contributions in the implementation

Muli-Context Query UI
    1. .

    2. Guides: Dr. Luciano Serafini, Dr. Andrei Tamilin

    3. Programming Language and Tools used: Java, Sesame rdf4J RDF library, OWLIM GraphDB OWL engine, Apache Solr indexing engine

    4. RDF approximator for OWL ontologies As Query answering on OWL ontologies are often very ineffi.cient and complexity of conjunctive query answering is still an open problem even for OWL DL. Subsumption and Instance checking in 2NEXPTIME-complete for OWL 2 DL, and also computing deductive closure is impractical as it can be worst-case infnite. We developed a semantics called RDF-Reduct semantics for approximating OWL ontologies in RDF. The semantics can be used for partially axiomatizing an OWL ontology as an RDF graph. Project was conceived at DFKI GMBH, Bremen, Germany while I was on an internship at University of Bremen.

    5. Guides: Prof. Till Mossakowski, Dr.Oliver Kutz, Dr. Christophe Lange

    6. Programming Language and Tools used: JAVA, Pellet API, OWLIM GraphDB OWL engine, Apache Jena RDF library

    7. Owl2Latex: A latex-like easily readable scripting language, called Ontex, for developing OWL/DL ontologies had been developed in our group in FBK (for details see: https://dkm.fbk.eu/technologies/tex-owl). There was the need to omplement a tool that acts as the inverse mapper. I implemented the OWL2Latex tool, that given an OWL ontology, parses this ontology, and converts into an

    8. ontology serialized in the Ontex format.

    9. Guides: Dr. Luciano Serafini, Dr. Marco Rospocher

    10. Programming Language: JAVA, Other Tools: OWL API

    11. Ephemerizer: Ephemerizer is a system that ensures that public keys are expired after their validity time. Implemention of a public key based Ephemerizer System that uses the power of Identity based Cryptography, where an arbitrary public key can be chosen to encrypt a message for a recipient, the private key for this key is generated at a convenient time by the help of an Ephemerizer Server, which holds a master key, that also plays the role of the private key generator. This power of IBC is used to implement an Ephemerizer system that make sure that data once deleted (timed out) cannot be recovered. The Ephemerizer holds the private keys of the encrypted data on clients(temperory data) that will eventually be deleted. We implemented the Ephemerizer as a web service.A snapshot of the commandline interface that lists the various encryption options for the end-user is depicted in the figure below.

Ephermerizer User Interface