ICT – Algorithms and Optimization for Big Data

HOME

Title:  Algorithms and Optimization for Big Data

Credit Structure (Lecture-Lab-Total): 3-0-3

Instructor: Ratnik Gandhi

Prerequisites:  Basics of: Discrete Mathematics, Linear Algebra, Calculus and Algorithms

 

Contents:

·         Linear Algebraic analysis

o   Shortest distances - linear regression

o   Error calculation

o   clustering

o   SVD/Eigen values/vectors for approximation or compression, security

o   Markovian analysis

o   Fourier/wavelet - time/freq or compression

o   Principle Component Analysis

·         Solving Linear-nonlinear optimization problems for/based on the data

·         Numerical and Differential/integral - continuous analysis for interpolation/extrapolation

o   convergence/divergence analysis

     

      Use of above techniques for solving big data problems with the help of relevant technological tools like -  R, Map Reduce, Hadoop, Spark

 

Textbooks: Required based on research papers.

 

References/ Detailed contents: (Some papers may change as we progress)

1.      Multi-Dimensional Regression Analysis of Time-Series Data Streams, Chen et al., Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.

2.      Linear Programming in the Semi-Streaming Model with Applications to the Maximum Matching Problem, Ahn and Guha, Arxiv 2011.

3.      Fast Low-Rank Modifications of the Think Singular Value Decomposition, M Brand, Elsevier 2006.

4.      Parallel and Collaborative filtering for Streaming Data, Ali, Jhonson, Tang, 2011.

5.      Streaming Algorithm for the SVD, Strumpen, Hoffmann, Agarwal, MIT LCS Technical Memo 2003.

6.      Matrix Factorization for Collaborative Prediction, Kleeman, Hendersen, Denuit.

7.      Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis, Gorrell and Webb.

8.      Analytic challenges in Social Sensing, Abdelzaher and Wang.

9.      Detecting anomaly in data streams by fractal model, Zhang et al., WWW 2014.

10.  Eigenspace Method for Spatiotemporal Hotspot Detection, Fanaee- T and Gama, Arxiv 2014.

 

 

Course outcomes

Student taking this course will develop an ability to independently take up a problem related to big data, model it and design a relevant solution.

 

 

Assessment

-          3 Weeks project – 30 %

-          7 Weeks project – 40 %

-          Problem solving – Final practical Exam – take home – expecting a solution – 30 %

 

How will we learn?

      This will be a Laboratory-class based course. Every week we will meet for a 3 hours session. During this session we will be discussing one or two ideas from reference research papers. Further, in this session, you (students) will be implementing these ideas in relevant software systems.