Pragyansmita Nayak- Hitachi Vantara Federal

Title: Data Catalog with Machine-Learning based engines for Statistical Programming

Abstract:

The solution architecture of a data-driven application should ideally be both scalable as well as flexible in order to aid analytics-related problems resolution down the road; pushing the knowledge growth, process automation and the effective strategic and tactical goals and data interplay. Data is continuously being generated at a scale that definitely requires automated operational support for operational efficiency in the public sector in order to meet the vision and mission of the organization. The varying formats of data - structured, unstructured and semi-structured - adds a different level to complexity and hinders usability of the data by the users (both person and non-person entities). In other words, if data is available, that does not typically translate to usable data nor accesible data.


Data catalogues smoothen this very critical data curation process and make the jobs and associated communications of data stewards and data custodians easier. This talk will explore how Data Catalogues with Machine-Learning based engines go a few steps further for more evolved data tagging and characterization that can address the mentioned data visibility and accessibility challenge for statistical programming practitioners.