Projects

Completed Funded Research Projects:

Project Title: Enhancing the Efficiency of Deep Learning in Big Data Analytics

(Funded By: Deanship of Scientific Research, Qassim University) - Project Budget: SAR 50,400

Recent advances in many-core architectures opened a new opportunity in harnessing their computing power for general purpose computing. The goal of big data analytics is to analyse datasets with high in volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at higher level of abstraction using the massive volumes of data. In this project, we are focusing on exploration of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on many-core architectures like NVIDIA GPUs. We will also develop a set of library routines optimized for GPUs using CUDA programming with multi-node execution capabilities for the deep learning algorithms related to big data analytics. The software library will be available as open source to maximize its use in academia and industry. We will perform data analytics using the designed library routines with open datasets from www.data.gov.sa in Health, Transport and Communication, Education and Training.

Project Title: Enhancing the Efficiency of Massively Parallel Programs in Computational Science and Engineering Applications

Funded By: King Abdulaziz City of Science and Technology, Saudi Arabia (Project No. 12-INF-3008-04) - (Project Budget: SAR 955,000)

Massively Parallel Computing is gaining ground in high-performance computing. CUDA (an extension to C) is most widely used parallel programming framework for general purpose Graphic processing Units (GPUs). However, the task of writing optimized CUDA programs is complex even for experts. We are proposing to develop an automatic restructuring to optimize CUDA programs for computational science and engineering applications with following features:

  • Identifying the condition for maximizing utilization of the GPU resources and establishing the relationships between the influencing parameters.
  • Developing algorithms that explore possible tiling solutions with coalesced memory access and resource optimizations that best meet the identified restructuring specifications. For this we will tailor the GPU constraints to achieve maximum performance such as the memory usage (global memory and shared memory), number of blocks, and number of threads per block. A restructuring tool (R-CUDA) will be developed to enable optimizing the performance of CUDA programs based on the restructuring specifications.
  • Building a 2-D Fluid Flow simulator based on the Navier-Stokes Equations for fixed boundary conditions. The simulator code will be optimized using the above restructuring tool to expose maximum data parallelism in dense and sparse linear algebra solvers.
  • Extensive testing of the tool using benchmarks from the LAPACK – BLAS library such as DGEMM, SGEMM, CAXPY and check for correctness. Also the use of profiling tools such as CUDA Visual Profiler, Parallel Nsight, TotalView to verify the restructuring specifications. The simulator will be tested and validated using typical test cases.

The major outcomes of this project are: 1) an automatic restructuring tool for optimizing the performance of CUDA programs focusing on dense and sparse linear algebra solvers, (2) an optimized 2-D Fluid Flow simulator based on the Navier-Stokes Equations for fixed boundary conditions, and (3) a research lab in Massively Parallel Computing and a graduate course in Computational Science and Engineering.

Our proposal is to develop a restructuring tool to ease the process of writing efficient CUDA programs and to use the tool to parallelize the 2-D Fluid Flow simulation based on the Navier-Stokes Equations. We want to build the expertise and the know-how that will lead to efficiently writing parallel code for scientific simulators to serve the graduate research program and the Oil and Gas industry in the kingdom of Saudi Arabia.

Prospective Funded Research Projects:

Project Title: Development of Scalable GIS Applications

(Submitted to HEC Pakistan for National Center for GIS and Space Applications)- (Project Budget: PKR 34.375 million)

Recent advances in sensing and navigation technologies with emerging applications produce geospatial data at a rapid growth having high volume and variety. These include satellite imagery data with spatial, temporal and spectral resolutions, point-cloud data with rich structural information generated from airborne and mobile radar sensors, and accumulated GPS traces from a huge number of mobile devices equipped with locating and navigation capabilities. Data processing of such a huge and variable dataset require high-end computing facilities and processing architectures. Similarly, polygons derived from point data clustering (e.g., lidar point clouds, GPS locations) and raster data segmentations (e.g., satellite and airborne remote sensing imagery) are likely to be even larger in volumes and computing-intensive. To efficiently process these large-scale, dynamic and diverse geospatial data and to effectively transform them into knowledge, a whole new set of data processing techniques are thus required.

The massive data parallel computing power provided by inexpensive commodity Graphics Processing Units (GPUs) makes large-scale spatial data processing on GPUs and GPU-accelerated clusters attractive from both a research and practical perspective.

Our research focuses on the following goals in this application domain:

  • Data parallel designs for several geospatial data processing techniques such as spatial indexing, spatial joins, and several other spatial operations including polygon rasterizations, polygon decomposition, and point interpolation
  • The data parallel designs are further scaled out to distributed computing nodes by integrating single-node GPU implementations with High-Performance Computing (HPC) cluster and the new generation in-memory Big Data systems such as Google’s Bigtable
  • Development of an integrated framework with multiple GPU-based geospatial processing modules into an open system that can be shared by the community.

Project Title: Towards Smart Medical Services in Cancer Detection with High Accuracy and Efficiency using Deep Learning Approaches

(Submitted for Approval) - Project Budget: SAR 42,400

Cancer has become a leading cause of death worldwide. It is the 2nd leading cause of death in the United States. The most common types of cancer in males are lung cancer and prostate cancer, whereas in females, it is breast cancer and colorectal cancer. In Saudi Arabia, colorectal cancer is the most common type of cancer in men, whereas it is second most common in women. Early detection of cancer is of vital importance and greatly enhances the chances of running a successful treatment plan. Computer Aided Diagnosis (CAD) systems are applied to detect various kinds of cancer. Recently, deep learning has been applied for breast cancer classification giving an accuracy of 99.68%. In this project, we aim to create a data set of medical images for training, validation and testing. We would also develop parallel deep learning algorithms for Graphics Processing Units (GPUs) so as to classify cancer patients. Our aim is to come up with smart algorithms having better efficiency along with a higher accuracy.

Further Interested Projects: (suitable for BS/MS/PhD Students, titles subject to revision based on the scope of the project for each level of students)

Research Project 1: Exploration of Structured Grid Computation on Many-Core Architectures

Recent advances in many-core architectures opened a new opportunity in harnessing their computing power for general purpose computing. However, efficient utilization of these architectures is possible with expert knowledge of the architecture and code optimizations. Several optimization techniques, tools and libraries have been proposed for these architectures focusing on diverse applications domains such as numerical linear algebra, linear iterative solvers, machine learning algorithms, image processing, fluid-flow dynamics and reservoir simulations. In this thesis, we will be focusing on accelerating computational science applications such as the class of structured grid computation (SGC). Most of the SGC simulation time is spent in repeatedly solving a large sparse linear system. Aggressive code optimizations are needed to achieve higher simulation accuracy by scaling up the application. Following are the key objectives of this work:

    • Discuss the mismatch between current compiler techniques and the requirements for implementing efficient iterative linear solvers.
    • Identify the basic and domain specific optimizations.
    • Explore the approaches used by computational scientists to program SGCs such as manual optimizations, utilization of numerical libraries, and use of a combination of integrated libraries, user annotations to define regularity in data structure and algorithm properties and auto-tuning.

Identify the main optimization functionalities for an integrated library to ease the process of defining complex SGC data structure and optimizing solver code using intelligent, high-level interface, and domain specific annotations.

Research Project 2: Evaluation of Sparse Linear Algebra Operators on Many-Core Architectures

The abundant data parallelism available in many-core architectures is needed to improve accuracy in scientific and engineering simulation (SES). In many cases, most of the simulation time is spent in a linear solver involving sparse matrix-vector multiply (SpMV). In petroleum forward Oil and Gas reservoir simulation (FRS), the application of a stencil relationship to a structured grid leads to a family of generalized Heptadiagonal (GH) sparse solver matrix with some regularity and structural uniqueness. To solve the large scale systems of linear equation one may use direct and iterative approaches. Direct methods are more reliable, more accurate but require large memory space and generally lacks scalability in many-core due to their sequential nature. Iterative methods have abundant data parallelism with large sparse systems of linear equations. Among the most popular iterative methods to solve large sparse linear system is Krylov subspace solvers and BiCGStab technique is one example solver. The most frequent operators are the matrix vector multiplication, vector inner product, and vector addition and scaling. The SpMV where the multiplication occurs between large sparse matrices with dense vectors is considered one of the most important computational and time-consuming kernels for a wide range of engineering and scientific applications from structural mechanics to quantum physics to fluid dynamics. Thus, to achieve higher performance on this operation, it is important to choose a sparse matrix storage format and utilize the underlying architecture carefully. Following are the key objectives of this work:

    • Exploration of current sparse matrix storage formats on many-core architectures using generalized hepta-diagonal sparse solver matrix.
    • Study and evaluate various libraries for sparse matrix operations on many-core architectures
    • Enhance one of the library to add sparse matrix formats in linear operators
    • Performance evaluation of the implemented linear operators using the well-known linear iterative solvers such as BiCG-Stab.

Research Project 3: Exploration of Deep Learning Algorithms using Parallel Programming Models

Deep learning is part of a broader family of machine learning methods based on learning representations of data. An observation (e.g., an image) can be represented in many ways such as a vector of intensity values per pixel, or in a more abstract way as a set of edges, regions of particular shape, etc. Some representations make it easier to learn tasks (e.g., face recognition or facial expression recognition) from examples. One of the promises of deep learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction. Various deep learning architectures such as deep neural networks, convolutional deep neural networks, deep belief networks and recurrent neural networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that massively parallel processors and/or numerous computers are often required for the practical application.

In this project, the students are expected to work on the following objectives:

    • Study various deep learning algorithms (any 3), their parallel implementations using different parallel programming models, and their applications.
    • Implement the above selected algorithms using any two parallel programming models/frameworks/libraries such as pthreads, CUDA, OpenMP, OpenACC, OpenCL, MPI or combination of these with best possible optimizations.
    • Performance evaluation of the above implementations in terms of execution time and cost.

Research Project 4: Exploration of Deep Learning Software on Parallel Architectures

Deep learning is a rapidly growing segment of artificial intelligence. It is increasingly used to deliver near-human level accuracy for image classification, voice recognition, natural language processing, sentiment analysis, recommendation engines, and more. Applications areas include facial recognition, scene detection, advanced medical and pharmaceutical research, and autonomous, self-driving vehicles. Deep learning uses neural networks (DNNs) many layers deep and large datasets to teach computers how to solve perceptual problems, such as detecting recognizable concepts in data, translating or understanding natural languages, interpreting information from input data, and more. Deep learning is used in the research community and in industry to help solve many big data problems such as computer vision, speech recognition and natural language processing. Practical examples include vehicle, pedestrian and landmark identification for driver assistance; image recognition; speech recognition; natural language processing; neural machine translation and cancer detection. Various software solutions has been provided in the market for deep learning using different parallel computing architectures including programming language extensions, libraries, frameworks.

In this project, the students are expected to work on the following objectives:

    • Study and explore various software tools/frameworks (any 3) for deep learning on parallel architectures
    • Develop a case study using the above selected tools in a specific application area such as computer vision, speech recognition, face recognition, etc.
    • Provide the evaluations of the selected frameworks on large datasets in terms of execution time, accuracy, and implementation cost.

Research Project 5: Big Data Analytics using Many-Core Architectures

Recent advances in many-core architectures opened a new opportunity in harnessing their computing power for general purpose computing. However, efficient utilization of these architectures is possible only with expert knowledge of the architecture and code optimizations. Several optimization techniques, tools and libraries have been proposed for these architectures focusing on diverse applications domains such as numerical linear algebra, linear iterative solvers, machine learning algorithms, image processing, fluid-flow dynamics and reservoir simulations. Big data analytics has the goal to analyse massive datasets, which increasingly occur in web-scale business intelligence problems. The common strategy to handle these workloads is to distribute the processing utilizing massive parallel analysis systems or to use big machines able to handle the workload. Today, many different hardware architectures apart from traditional CPUs can be used to process data. GPUs (Graphics Processing Units), MICs (Many Integrated Cores) or FPGAs (Field Programmable Gate Arrays) etc. are usually employed as co-processors to accelerate query execution. Such hardware architectures offer the processing capability to distribute the workload among the CPU and other processors, and enable systems to process bigger workloads.

In this project, the students are expected to work on the following objectives:

    • Study and review the latest trends of parallel computing using massively parallel processors in big data analytics
    • Explore various software tools/frameworks to be useful for big data analytics using parallel computing
    • Develop a case study using these tools (any 3) to perform data processing and analysis on the real datasets