Shahar Mendelson - Some thoughts on Mathematics of Data Science

Some thoughts on Maths of Data Science

Data Science is a rapidly developing area with tremendous impact on everyday life. Nowadays, almost every form of information extraction is labelled "Data Science". Within the academic world, data scientists are found in engineering, computer science, statistics, biology and physics departments, but an overwhelming majority of them focus on domain-specific problems rather than on fundamental understanding of information extraction: namely, when and why information extraction is possible.

As it happens, core questions in Data Science have strong ties to deep problems in pure mathematics – most notably, to the study of high dimensional phenomena. But despite the obvious significance of Data Science, its mathematical foundations are far from understood. Much of the progress we witness today is due to technological progress (e.g., cheap, readily available, and efficient hardware) and to the success of ad-hoc algorithms that are based on remarkable intuition. Without a much better understanding of the foundations of Data Science that progress is likely to reach a plateau.

Improving our understanding of the foundations of Data Science is a challenge: it requires answering key mathematical questions. What I find to be truly exciting is that in many instances, once the questions are stripped of the "applied language", one is left with fundamental questions in pure maths. From that perspective, maths of data science is the best of two worlds: a beautiful theory that leads to meaningful applications.