Language Model

The data can be downloaded here:
Data (247 MB)
The .zip data contains two csv files (semicolon separated). One for the document embeddings and one for the word embeddings.

In addition, the documents contain the meta information about each document:

  • doc_id

  • title

  • speaker

  • date

  • nwords,

  • type

  • language

  • cb

  • country

  • country_code

  • currency

  • currency_code

  • link

Download via R Code:

load_cb_embeddings<-function(type=c("word","doc"), doc_url="https://www.dropbox.com/s/ewt6t66rwhpdjld/document_embeddings.csv?raw=1", word_url="https://www.dropbox.com/s/71mm5md2utub04f/word_embeddings.csv?raw=1"){ output<-list() if("doc" %in% type){ output[["doc"]]<-readr::read_delim(doc_url,";", escape_double = FALSE, trim_ws = TRUE) } if("word" %in% type){ output[["word"]]<-readr::read_delim(word_url,";", escape_double = FALSE, trim_ws = TRUE) } return(output)}
embeddings<-load_cb_embeddings()