Language Model

The data can be downloaded here:
Data (247 MB)
The .zip data contains two csv files (semicolon separated). One for the document embeddings and one for the word embeddings.

In addition, the documents contain the meta information about each document:

doc_id
title
speaker
date
nwords,
type
language
cb
country
country_code
currency
currency_code
link

Download via R Code:

load_cb_embeddings<-function(type=c("word","doc"), doc_url="https://www.dropbox.com/s/ewt6t66rwhpdjld/document_embeddings.csv?raw=1", word_url="https://www.dropbox.com/s/71mm5md2utub04f/word_embeddings.csv?raw=1"){ output<-list() if("doc" %in% type){ output[["doc"]]<-readr::read_delim(doc_url,";", escape_double = FALSE, trim_ws = TRUE) } if("word" %in% type){ output[["word"]]<-readr::read_delim(word_url,";", escape_double = FALSE, trim_ws = TRUE) } return(output)}
embeddings<-load_cb_embeddings()