Zhang:2018:CHI

J. Zhang, C. Surakitbanharn, N. Elmqvist, R. Maciejewski, Z. Qian, and D. S. Ebert. TopoText: Context-preserving text data exploration across multiple spatial scales. In Proc. ACM CHI Conference on Human Factors in Computing Systems, Paper No. 37, Montreal QC, 2018. DOI.

This is one of the three post-hoc case studies discussed in the 2019 paper by Chen and Ebert, where the IVAS framework was proposed. The analysis was reported in Appendix C.3 of the paper. The second author (DSE) suggested this paper, which was unknown to the first author (MC) previously. MC first read the paper and then wrote a report as an independent reviewer. The report took about 40-60 minutes to complete, including the effort of writing but excluding the time for the first reading. MC then emailed the report to DSE for comments. There were a few email correspondences to discuss the report.

MC: This report of abstract reasoning is about the TopoText paper:

TopoText: Context-Preserving Text Data Exploration Across Multiple Spatial Scales.
Jiawei Zhang, Chittayong Surakitbanharn, Niklas Elmqvist, Ross Maciejewski, Zhenyu Qian, and David S. Ebert

This can be considered as an application of abstract reasoning to a visualization process in a workflow. The paper explores different design options for the process. There is fundamentally no change to the analytical algorithms preceding the visualization. We can also assume that there is no change to the visualization tasks that rely on the visualization process.

The empirical studies examined two main questions about the textual information, and about the geographic and hierarchical relationships. These questions encode the visualization tasks concerned. The studies focused on response time (i.e., completion time), which can be considered as the cost of the process. The results show that

For gaining textual information:

Cost: M-bd > M-bh > M-sp

For understanding the geographic and hierarchical relationships:

Cost: M-bd < M-sp < M-bh

Abstract Reasoning is useful for (a) interpreting the empirical results, (b) applying such results in practice with additional contextual information, and (c) in situations where one cannot do empirical studies.

Stating with Symptoms and Causes:

The heat map and tag-cloud over a map are the two techniques used as the benchmark (or reference) techniques. The heat map does not show textual information, and for each keyword, there is a heat map. So there is not enough Alphabet Compression, and there is a huge amount of Time Cost for looking at different heat maps for different keywords. Trying to build an overview in mind is cognitive challenging, hence more Cognitive Cost, and Potential Distortion. Meanwhile the heat map is intuitive and familiar to most users.

Time Cost is caused by insufficient AC by visualization Cognitive Cost is caused by limited memory capacity thus by the above cause.

The approach of tag cloud over a map can convey the textual information as well as the numerical scales of each keyword at different locations. Because of the changes of font size, it makes reasoning about the location of each tag hard, as it would be more intuitive to relate the text to the region covered by the text. The users have to suppress this intuitive interpretation (a bit like trying to suppress a naturally-occurred visual illusion). Hence there is more Cognitive Load, and Potential Distortion.

Cognitive Cost is caused by an ineffective "mental algorithm" within the process, which usually transforms different sizes in a geographical map to the different levels of spatial coverages.

Neither visual representation can easily be extended to indicate multiple geographical scales (i.e., hierarchy). One important note here is about the observation of the map (or the Potential Distortion in observing the map). This depends on how familiar the users are with the areas, and how much details are required for the tasks to be performed. In the empirical studies, these two factors are simplified, but in practical applications, they must be considered.

Abstract Reasoning about the Optional Remedies:

Four single-level remedies were proposed. Based on these, three multi-level remedies were proposed.

M-bd offers

more Alphabet Compression about the text (by discarding information, some of which may be useful),
more Alphabet Compression about the hierarchical structure (by discarding information that are not useful,
but less Alphabet Compression about the map (i.e., more geographical information is available, which is useful for some users and some tasks),
more Potential Distortion and Cost about text as the text orientations are not uniform.

M-bd is NOT a pure M-bd as its inner region is actually rendered using M-sp (or S-sp).

M-sp offers

less Alphabet Compression about the text,
less Alphabet Compression about the hierarchical structure,
but more Alphabet Compression about the map due to occlusion.

M-bh is an in-between remedy. However, for the perception of hierarchy, the jagged edges caused by long and short keywords may affect the perception of the boundary. Perceptually, the trade-off between AC and PD or Cost is not easy to estimate.

The abstract reasoning would yield the following hypotheses for normalized users and tasks:

(a) Cost for gaining textual information: M-bd > M-bh > M-sp.
(b) Cost for gaining hierarchical information: M-bd < M-bh =?= M-sp.
(c) Cost for gaining geographical information: M-bd < M-bh < Msp.

We can also hypothesize that if there is a pure M-bd, called PMbd, we would have: (extending (a), (b), (c) above):

(a) PM-bd > M-bd
(b) PM-bd <= M-bd
(c) PM-bd < M-bd

The abstract reasoning is consistent with the study results:

(a) Study 1 shows: M-bd > M-bh > M-sp
(b) and (c): Study 2 shows: M-bd < M-sp < M-bh

Note that Study 2's results mixed (b) and (c) together. Nevertheless, it indicates that the trade-off between AC and PD for (b) is that PD and Cost due to jagged edges seem high.

Reasoning about Side-Effects:

M-bd's weakness about textual information can potentially be alleviated by some interactions, such as a tooltip window for showing keywords at different locations, while the permanent texts maintain the externalized memory.

M-sp's weakness about geographical information can potentially be alleviated by some interactions, such as a dial for changing the transparency of the overlaid text. For users with good local knowledge, such interaction will rarely be required.

Reasoning about the Applications based on Empirical Studies:

It is correct that the studies had to use students whose geographical knowledge and the sense about tasks are controlled to avoid confounding effects.

This means that when we apply the results to real world applications, we must use abstract reasoning to re-analyze the impact of users’ geographical knowledge and the dependency of the tasks on different levels of geographical information shown on the map. If the users have good geographical knowledge about the area concerned, or can acquire such knowledge quickly after a few taskrelated observations, then M-sp could possibly be a preferred option.

In the paper, the authors described one application, and the decision is in favor of M-bd. The selection of a distant location based on a major event means that the users may not have good geographical knowledge. Hence the selection of M-bd is a sensible decision.