Abstract Entities

Definition

An abstract entity is an abstract concept that can be used to characterize a collection of concrete entities. The mapping from concrete entities to abstract entities allows us to reason the about the relationships among symptoms, causes, remedies, and side-effects with a much smaller number of permutations of such relationships. In the IVAS ontology, there are 24 abstract entities as detailed below. Using these 24 abstract entities, we can conduct abstract reasoning about the three sets of pairwise relationships, i.e., between symptoms and causes, between causes and remedies, and between remedies and side-effects.

Four Types of Visual Analytics Processes

  • Statistics -- a class of machine-centric processes that transform data to one or more statistical measures, e.g., means, standard deviations, Pearson product-moment correlation coefficient, and so on. The class also includes more complex processes of statistical inference, such as hypothesis testing, regression analysis, bootstrapping, and so on. Some of these processes may include simple algorithms, such as sorting.
  • Algorithm -- a class of machine-centric processes that transform data to different data that may represent some decisions (e.g., recognition, detection, and clustering), a subset of data (e.g., search, retrieval, and filtering), reorganized data (e.g., sorting), abstract models (e.g., machine learning), and so on. Some of these processes may be deployed as software tools and systems, and some may include components of statistical analysis.
  • Visualization -- a class of human-centric processes that transform data to visual representations and then to observations made by human viewers. As a broad definition, these processes also include the subsequent actions for detecting patterns, reaching conclusions, and making decisions, as well as non-visual human-centric processes for acquiring data, such as reading and hearing.
  • Interaction -- a class of human-centric processes that human users interact with the computer to influence the running of some machine-centric processes (e.g., selecting commands, setting parameters, entering more data, and controlling navigation). One may also extend the definition of this class to include processes of human-human communication and collaboration.

See also: M. Chen, A. Trefethen, R. Banares-Alcantara, M. Jirotka, B. Coecke, T. Ertl and A. Schmidt, "From data analysis and visualization to causality discovery." IEEE Computer, 44(10):84-87, 2011. DOI

Three Information-Theoretic Measures

These are the three measures of the metric for analysing the cost-benefit ratio of machine- and human-centric processes in data intelligence.

  • Alphabet Compression (AC) -- It measures the amount of entropy reduction (or information loss) achieved by a process. Most visual analytics (VA) processes (e.g., statistical aggregation, sorting, clustering, visual mapping, and interaction), feature many-to-one mappings from input to output, hence losing information. Although information loss is commonly regarded harmful, it cannot be all bad if it is a general trend of VA workflows. Thus the cost-benefit metric makes AC a positive component.
  • Potential Distortion (PD) -- It balances the positive nature of AC by measuring the errors typically due to information loss. Instead of measuring mapping errors using some third party metrics, PD measures the potential distortion when one reconstructs inputs from outputs. The measurement takes into account humans' knowledge that can be used to improve the reconstruction processes. For example, given an average mark of 62%, the teacher who taught the class can normally guess the distribution of the marks among the students better than an arbitrary person.
  • Cost -- It measures the cost of a process, including the cost of the forward transformation from input to output and the cost of the inverse transformation of reconstruction provides a further balancing factor in the cost-benefit metric in addition to the trade-off between AC and PD. In practice, one may measure the cost using time or a monetary measurement.

See also: M. Chen and A. Golan, "What May Visualization Processes Optimize?" IEEE Transactions on Visualization and Computer Graphics, 22(12):2619-2632, 2016. DOI.

The 24 Abstract Entities

Alphabet Compression

  • Stat-High-AC (Statistics, High Alphabet Compress) -- an increased or over-increased measure of AC by a Statistical Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy reduction (e.g., errors due to information loss). When it is considered with remedies, one considers the positive impact of entropy reduction (i.e., transforming data to decisions and knowledge).
  • Stat-Low-AC (Statistics, Low Alphabet Compress) -- a decreased or over-decreased measure of AC by a Statistical Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy preservation (e.g., too much data to handle). When it is considered with remedies, one considers the positive impact of entropy preservation (i.e., alleviating information loss).
  • Alg-High-AC (Algorithm, High Alphabet Compress) -- an increased or over-increased measure of AC by an Algorithmic Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy reduction (e.g., errors due to information loss). When it is considered with remedies, one considers the positive impact of entropy reduction (i.e., transforming data to decisions and knowledge).
  • Alg-Low-AC (Algorithm, Low Alphabet Compress) -- a decreased or over-decreased measure of AC by an Algorithmic Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy preservation (e.g., too much data to handle). When it is considered with remedies, one considers the positive impact of entropy preservation (i.e., alleviating information loss).
  • Vis-High-AC (Visualization, High Alphabet Compress) -- an increased or over-increased measure of AC by a Visualization Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy reduction (e.g., errors due to information loss). When it is considered with remedies, one considers the positive impact of entropy reduction (i.e., transforming data to decisions and knowledge).
  • Vis-Low-AC (Visualization, Low Alphabet Compress) -- a decreased or over-decreased measure of AC by a Visualization Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy preservation (e.g., too much data to handle). When it is considered with remedies, one considers the positive impact of entropy preservation (i.e., alleviating information loss).
  • Int-High-AC (Interaction, High Alphabet Compress) -- an increased or over-increased measure of AC by an Interaction Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy reduction (e.g., errors due to information loss). When it is considered with remedies, one considers the positive impact of entropy reduction (i.e., transforming data to decisions and knowledge).
  • Int-Low-AC (Interaction, Low Alphabet Compress) -- a decreased or over-decreased measure of AC by an Interaction Process. When this characterisation is associated with symptoms, cause, and side-effects, one considers the negative impact of entropy preservation (e.g., too much data to handle). When it is considered with remedies, one considers the positive impact of entropy preservation (i.e., alleviating information loss).

Potential Distortion

  • Stat-High-PD (Statistics, High Potential Distortion) -- an increased or over-increased measure of PD by a Statistical Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Stat-Low-PD (Statistics, Low Potential Distortion) -- a decreased measure of PD by a Statistical Process. Normally this characterisation is associated only with remedies.
  • Alg-High-PD (Algorithm, High Potential Distortion) -- an increased or over-increased measure of PD by an Algorithmic Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Alg-Low-PD (Algorithm, Low Potential Distortion) -- a decreased measure of PD by an Algorithmic Process. Normally this characterisation is associated only with remedies.
  • Vis-High-PD (Visualization, High Potential Distortion) -- an increased or over-increased measure of PD by a Visualization Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Vis-Low-PD (Visualization, Low Potential Distortion) -- a decreased measure of PD by a Visualization Process. Normally this characterisation is associated only with remedies.
  • Int-High-PD (Interaction, High Potential Distortion) -- an increased or over-increased measure of PD by an Interaction Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Int-Low-PD (Interaction, Low Potential Distortion) -- a decreased measure of PD by an Interaction Process. Normally this characterisation is associated only with remedies.

Cost

  • Stat-High-Cost (Statistics, High Cost) -- an increased or over-increased measure of Cost by a Statistical Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Stat-Low-Cost (Statistics, Low Cost) -- a decreased measure of Cost by a Statistical Process. Normally this characterisation is associated only with remedies.
  • Alg-High-Cost (Algorithm, High Cost) -- an increased or over-increased measure of Cost by an Algorithmic Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Alg-Low-Cost (Algorithm, Low Cost) -- a decreased measure of Cost by an Algorithmic Process. Normally this characterisation is associated only with remedies.
  • Vis-High-Cost (Visualization, High Cost) -- an increased or over-increased measure of Cost by a Visualization Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Vis-Low-Cost (Visualization, Low Cost) -- a decreased measure of Cost by a Visualization Process. Normally this characterisation is associated only with remedies.
  • Int-High-Cost (Interaction, High Cost) -- an increased or over-increased measure of Cost by an Interaction Process. Normally this characterisation is associated only with symptoms, cause, and side-effects.
  • Int-Low-Cost (Interaction, Low Cost) -- a decreased measure of Cost by an Interaction Process. Normally this characterisation is associated only with remedies.