By treating a side-effect as a new symptom, the next symptom is that when we visualized texts as a bipartite graph, the Potential Distortion (PD) becomes too high, making it difficult to see patterns. Visually, because each connection between two similar word is a line, the display cost for each piece of information is too high. From the perspective of each pixel along a line connection, the computer screen would consider every pixel is an independent variable (i.e., an alphabet). However, from the perspective of the drawing algorithm, there is only one variable about the similarity between the two words. If the computer screen could use only one pixel, then it would achieve a great deal of Alphabet Compression (AC). Hence, the possible remedy was to use a 2D pixel map or matrix to display each similarity connection using only one pixel. Humans can use patterns of pixels to judge the similarity at the passage-level as shown on the right of Fig. 2. In comparison with displaying a similarity connection using a line, the side effect of the pixel- or matrix-based approach is increased cognitive load due to need to find the source and destination nodes from a pixel. When there are only a few connections, the line-based approach has lower cognitive load as lines guide the viewers to find the source and destination for each similarity connection. However, when the lines become cluttered, the cognitive load for following each line becomes very high, so the pixel-based approach offers the better trade-offs. Following this reasoning, we consider that the side-effect is tolerable.
Going back to the early discussion about visualizing analytical results at the word level, there was another side effect. Because we noticed that the literature scholars had a lot knowledge and different definitions in judging similarity, we considered that it would be difficult for machine learning to acquire such knowledge and definitions solely from the dataset available at that time. For different definitions, ideally there would be different analytical algorithms, but through our discussions, we could not establish a relatively stable set of definitions, as the literature scholars were able to formulate definitions dynamically, often inflenced by their search tasks and the relevant data. So the side-effect was that the literature scholars might have to translate a definition to an algorithm on demand.
By treating this side-effect as a new symptom, the third symptom, is the difficulty for literature scholars to construct algorithms. If one were to define an algorithm using a programming language or pseudo-code, there would be a huge amount of flexibility. This means the alphabet for possible algorithms is huge. We therefore reduced this alphabet by allowing users to select some predefined algorithmic components, so they only need to deal with the components rather than statements in a programming language. A "program" at the component level typically has only a few components, so the alphabet for such "programs" is smaller. We introduced interactions as shown in Fig. 3. A side effect was that literature scholars had to learn to build such algorithms. We organized a workshop involving a number of literature scholars. We found that they could learn to "program" their algorithms within two hours and they enjoyed the learning process. So the size-effect was manageable and the learning cost was justifiable.