Workshop Overview

All fields where data is collected have seen an increase in the amount of data being analyzed, and this has led to an increase in work related to data mining and machine learning.  These methods are moving out of academic and high tech fields and into new and everyday applications.  This workshop will focus on two primary issues when applying data mining in practice: how to incorporate the cost of data into the problem and how to automate the process in data mining.

Cost

From a data perspective, cost can be divided into three primary areas:  1) collection cost – costs associated with acquiring each data stream, 2) labeling costs – costs associated with assigning classes, acquiring response variable values in supervised learning, or acquiring the values of missing explanatory data, and 3) processing costs – costs associated with model training, prediction, and storage.  In the data mining literature, algorithms and models are usually optimized with respect to predictive accuracy and little is published on incorporating costs into the data mining process.  However, in almost all real-world data mining applications, costs are present and should be considered.  In some applications, costs could be constrained resulting in a total cost budget.  In order to data mining applications to be successful, they must meet the system requirements and objectives while not exceeding the budget. 

Automation

There have been many debates in recent years about the need and the ability to automate data mining and machine learning tasks. A recent blog post titled “Data Scientists Need More Automation” discusses the repeated efforts required to configure and run services or scripts on a network of machines. Other discussions ask, “Can We Automate Data Mining?,” arguing that many tasks performed by data scientists “cannot be automated and need manual intervention”; in other words, expertise is needed for each individual case, requiring clear understanding of the business and the data. The advancement, education, and adoption of data mining and machine learning practices require a transformation of theory to application, and feedback from application to theory. The development of tools to automate data mining efforts fosters this transformation and feedback and also promotes the development of standards and the adoption of these standards. Automated standards enable researchers and practitioners to better communicate, sharing successes and challenges in a more consistent common language. In an age of software as a service and ever-increasing scalability requirements, standards are necessary. Consistent adoption, application, and communication in turn promote research and refinement of the automated strategies and growth of the community. To keep pace with the rapidly increasing volume and rate of data generation, standardization and automating of data mining activities are critical. The challenges that must be discussed relate to the boundaries of automated tasks and individual attention needed for each unique business and data scenario.


This workshop is held in conjunction with the 17th IEEE International Conference on Data Mining (IEEE ICDM 2017)

Technical Co-Sponsor: IEEE SMC Society TC on Human Perception in Multimedia Computing



Showing 0 items
OwnerDescriptionResolutionComplete
Sort 
 
Sort 
 
Sort 
 
Sort 
 
OwnerDescriptionResolutionComplete
Showing 0 items
Comments