Beyond Word Embeddings: Learning Entity and Concept Representations from Large Scale Knowledge Bases

  • This is the code and data repository for our project for learning entity and concept representations from Wikipedia and Microsoft Knowledge Graph (aka Probase) 
  • This paper is accepted at the Information Retrieval Journal; check it here. You can check arXiv for a preprint of the paper here

  • Embedding Evaluation [src]

    The below commands explain how to reproduce reported results on the analogical reasoning and concept clustering datasets. The reported results are already under the "results" directory

    • You can download the data from here
    • You can download the models (38GB) from here or email me for a compressed version.

    Word Analogies

    • CME model python3 embedding-eval/concept_embed_eval_analogy.py --data_path data/analogy-questions-concepts/ --model_file models/cme.bin --model_format bin --split_size 0 --conceptualized

    • Word2Vec baseline model python /users/wshalaby/github/embedding-eval/concept_embed_eval_analogy.py --data_path data/analogy-questions/ --model_file models/w2v.bin --model_format bin --split_size 0

    Concept Clustering

    • CME model with bootstrapping python3 concept_embed_eval_clustering.py --data_path data/concept-clustering/ --model_file models/cme.bin --model_format bin --bootstrap --min_bs_score 0.0 --concepts --confusion

    • CME model without bootstrapping python3 concept_embed_eval_clustering.py --data_path data/concept-clustering/ --model_file models/cme.bin --model_format bin --bootstrap --min_bs_score 2.0 --concepts --confusion

    • Word2Vec baseline model with bootstrapping python3 concept_embed_eval_clustering.py --data_path data/concept-clustering/ --model_file models/w2v.bin --model_format bin --bootstrap --min_bs_score 0.0 --concepts --confusion

    • Word2Vec baseline model without bootstrapping python3 concept_embed_eval_clustering.py --data_path data/concept-clustering/ --model_file models/w2v.bin --model_format bin --bootstrap --min_bs_score 2.0 --concepts --confusion