MC: Here is a Report of Abstract Reasoning for Case Study 2. I am sorry that my analysis may sound a bit critical, and I guess that it is the "fault" of abstract reasoning as it forces one to search for problems and solutions systematically.
Jiawei Zhang et al. A Visual Analytics Framework for Microblog Data Analysis at Multiple Scales of Aggregation, CGF, 2016.
This paper did not describe a particular original system, but some techniques in the literature were mentioned as the benchmark (reference) system. Nevertheless, the users were clearly defined, i.e., causal analysis experts in the emergency and law enforcement agencies. From Section 3 of this paper, one can gather that the users have access to sentiment analysis and topic modelling in their workflow, but have found that they provide only a coarse-grained overview about the situation.
1. Symptoms of the Original Workflow:
(A). The lack of fine-grained, crisis-related categorizations. This implies that the existing analytical techniques have a high-level alphabet compression but the resulting output categories do not allow experts to reconstruct the parts of original data useful to their task. Using the students’ examination marks as an example again, this is a bit like that the students (cf. microblogs) are categorized based on their calculus or algebra results, but the users really want a categorization based on their sports interests. Trying to infer their sports interests from the categorizations of mathematical ability, there will be a lot of potential distortion.
(B). The users wish to have some controls over the scales of space, time, and data categorization, but pre-defined aggregation schemes cause abrupt changes. The lack of user controls of the scales of statistical aggregation implies that too much alphabet compression by the algorithms, which did not leave many options (i.e., letters) to users.
(C). The rapid arrival of the data results in rapid changes of the visualization, disrupting the ongoing analysis. This implies not enough alphabet compression by the visualization.
(D). Related to (A), (B), (C), there is an obvious symptom that viewing overly fine-grained data (e.g., all microblogs) is too costly and not effective. Although it was not mentioned in the paper, we can assume that this option has been ruled out at the beginning.
2. Analysis of Possible Causes:
(A) is firstly caused by an inappropriate algorithm designed for different categorization. Thus simply reducing the scale of alphabet compression will not necessarily solve this problem. When the domain experts said “we want to see more”, in this case they might mean “we want to see different topic modelling or sentiment analysis.” It might also be caused by the same issue of (B) as the paper noted.
(B) is caused by the lack of interaction for controlling the algorithms.
(C) may be caused by a lack of alphabet compression, but may also be caused by humans’ poor memory in remembering what has been observed, i.e., potential distortion in recall, and cost of cognitive load to remember and that of effort for repeatedly going backward and forward. Thus (C) may be really caused by the lack of a suitable alphabet compression by using visualization for memory externalization.