HDM'25

The 12th ICDM workshop for high dimensional data mining

November 14 @ Washington DC, USA

Overview of the workshop topic

Unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web and social networks analysis, among many others. Propelled by the new awareness in the importance of data, practitioners from all areas maintain large repositories of high-dimensional data, albeit only some of them are tagged/labelled, most are unlabelled raw data waiting to be taken advantage of. The number of features in such data is often of the order of thousands or millions, that is much larger than the available (labelled or unlabelled) sample size.

Moreover, high dimensional data with limited sample sizes come with a number of challenges:

High dimensional geometry defeats our intuition rooted in low dimensional experiences, which makes data presentation and visualisation particularly challenging.
Phenomena that occur in high dimensional probability spaces, such as the concentration of measure, are counter-intuitive for the data mining practitioner. For instance, distance concentration is the phenomenon that the contrast between pairwise distances may vanish as the dimensionality increases.
Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available.
The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data.
The computation cost of processing high dimensional data or carrying out optimisation over high dimensional parameter domains is often prohibiting.

Aims and Scope

This workshop aims to bring together researchers from databases, data science, machine learning, statistics to cross-pollinate ideas, facilitate collaboration, and expand the breadth and reach of methods and technology to address the curses, exploit the blessings of high dimensionality in data mining, and forge new directions in data mining research.

This year we would like to particularly encourage work that counters the issues of low sample size and takes advantage of unlabelled or auxiliary data for high dimensional data mining. Topics of interest include (but are not limited to) the following:

Learning and mining with weak supervision, exploiting unlabelled data in high dimensional settings.
Managing the tradeoff between computation cost and statistical efficiency.
Models of low intrinsic structure, such as sparse representation, manifold models, latent structure models, overparametrised models, compressible models.
Effect and mitigation of noise and the curse of dimensionality on data mining methods.
Theoretical underpinning of data mining where the data dimension can be larger than the sample size.
New data mining techniques that exploit properties of high dimensional data spaces.
Adaptive and non-adaptive dimensionality reduction for high dimensional data sets.
Random projections, and random matrix theory applied to high dimensional data mining.
Functional data mining.
Data mining applications to real problems in science, engineering, businesses, and the humanities, where the data is high dimensional.

Workshop Day

- Date: November 14 (Friday), 2025
- Time: 10.30 - 12.00
- Room: New York
- Session chair: Aman Shukla
- Format: 25 minutes presentation followed by 5 minutes discussion

Schedule

Program committee

Ata Kaban, University of Birmingham, UK
Dharmam Buch, Intuit, USA
Jakramate Bootkrajang, Chiang-Mai University, Thailand
Mark Chai, Northwest Missouri State University, USA
Papangkorn Inkaew, Chiang Mai University, Thailand

Local organiser

Aman Shukla, Resonate, USA

Organiser contacts

Jakramate Bootkrajang

Chiang Mai University

jakramate.b@cmu.ac.th

Ata Kaban

University of Birmingham

a.kaban@bham.ac.uk

Page updated

Google Sites

Report abuse