We conduct literature review to understand the current state of the art of the research related to this topic, and also to understand how to implement our system. The literature review is on the following topics:
Web Scraping to generate data: We looked into various frameworks and libraries that use Python for web scraping such as Selenium and BeautifulSoup. We are sticking to Python since we are comfortable with this programming language. We need to code to extract information at a large scale, and so, the framework/libraries must be able to do that. Moreover, our information extracted must match that mentioned in the paper by Azoulay et al. [1]. Therefore, to define our requirement and outline a method to extract the required information, we must read the paper thoroughly and research on the methods they used.
Diversity metric selection: We look into the appropriate diversity/creativity metrics from the existing pool.
Data visualization: We identify what kinds of visualization techniques can be used to present network.
Findings:
Confirmation that these are indeed creative links [1]
Information on extraction method of all 112 superstars links [2]
NLP technique [4]