Chinese AnalogySpace

AnalogySpace Matrix

The AnalogySpace matrix represents each concept as a feature vector. The feature of a concept is its neighbor and the relation. For example, "has fur" and "capable of flying" are features of bird.

Figure 1. Example of feature for concept "bird."

The assertions in our knowledge base (e.g. Chinese ConceptNet) can be converted to the AnalogySpace matrix. The rows in AnalogySpace matrix are concepts; the columns are their features. Each entry is associated with a real-number value which is the number of collected sentences for an assertion. Figure 2 is part of the AnalogySpace matrix.

Figure 2. Part of the AnalogySpace matrix.

Semantics of AnalogySpace Matrix

For any two rows in the AnalogySpace matrix, we can find that the sentence in an inference rule can be replaced by other sentence and gives plausible inference results if the two sentences have similar truth assignments for the same feature. For example, the sentences PartOf(fur, cat) and IsA(cat, pet) in modus ponens rule can be replaced by PartOf (fur, dog) and IsA(dog, pet). Similarity of any two concepts can then be defined as the number of shared features. We can use similarity to identify the semantic meaning of concepts.

Build Chinese AnalogySpace

Since the size of knowledge base is always very large, the AnalogySpace matrix must be large and sparse. We apply truncated singular value decomposition (truncated SVD) on AnalogySpace matrix to smooth the noisy data in the knowledge base. The concepts are then transformed to a k-dimensional vector space spanned by eigen-features. In the vector space spanned by eigen-features, the proximity of two concepts represents their level of overlaps in features. Therefore, the similarity of two concept vectors can be defined by their cosine similarity. Figure 3 is the projection of 1st and 2nd dimension of Chinese AnalogySpace. The 1st dimension groups the things people don't want together; the 2nd dimension is most about the objects we can find in our daily life.

Figure 3. The first two dimensions of Chinese AnalogySpace.

Use Chinese AnalogySpace

If you would like to use Chinese AnalogySpace to find similar concepts or the concepts in the same category, you can use the python library Divisi to create the Chinese AnalogySpace matrix, reduce dimensionality with SVD, and calculate similarity of concepts.

Demo Video