Bioinformatics Workflow Modelling and Analysis

Next Generation Sequencing has introduced novel means of sequencing millions of DNA molecules simultaneously and has opened new avenues in the field of bioinformatics that requires high performance computing technologies. Bioinformatics pipelines are constructed to carry out bioinformatics analyses in a fast and efficient manner. Workflow systems are developed to simplify the construction of pipelines and automate analyses. Still, with the availability of large amounts of sequence data, it has become challenging to have results within a reasonable amount of time. Use of hardware accelerators, such as graphics processing units (GPUs) and distributed computing, accelerates the processing of big data in high performance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput.

It is imperative to increase the productivity of the existing systems and at the same time execute large jobs associated with the domain. Various scheduling techniques ranging from classic First Come First Serve to the latest cloud technologies such as MapReduce can be used to execute these jobs in parallel.

A generic software framework that can be used to construct bioinformatics workflows by both biologists and bioinformaticians with any level of programming expertise. The research addresses a GPU accelerated generic software system to construct bioinformatics workflows. The system allows performing analyses through dedicated GPU computing resources, while incorporating novel web technologies to support specific requirements of bioinformatics software.

Another research addresses an optimized platform to support the execution of various bioinformatics computations that deal with massively large datasets. This platform comprises of a MapReduce model that adopts multilevel feedback queue algorithm in scheduling such large-scale, time-consuming jobs parallel in a multicore processor. A broad comparison of existing common scheduling algorithms is conducted, to identify the most suitable scheduling algorithm.

BioWorkflow: interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. Further, this tool is developed as a learning tool to support bioinformatics teaching and research, by visualizing concepts relevant to sequence alignment and workflow modelling.