The tutorial is split into four logical blocks that last about two hours and a half in total.
The first session introduce the main issues related to the exploration of Big Linked Data. We describe the most important problems related to the visualization and understanding of unknown LOD, datasets that do not have fix structure, datasets that can have a very huge amount of data. We highlight how these problems are addressed by the state-of-the-art approaches and what is still missing.
We also mention also the impact that LOD might have in the society. Several examples demonstrate the impact that LODs have and will have on a socio-economic level [Bauer 2011, Ding 2012, Keseru 2015]. The European Commission highlighted, in a study commissioned in 2011, an economic impact of open data that has a value of € 140 billion a year between direct and indirect effects. Data plays a fundamental role in all aspects of human activity and social interest. Big Linked Data can introduce innovative solutions in the public and private sectors, through the development of data driven infrastructures and applications.
In this second session, we describe the state-of-the-art of Linked Data visualization systems with particular attention to those tools able to navigate vast amount of data [Dadzie 2011, Marie 2014]. We start describing the evolution over time of the systems focused on Linked Data exploration: Linked Data Browsers, Linked Data Exploration Systems, Linked Data Graph Tools and Ontology visualization Systems. In the end, we pay attention on the scalability issues.
Linked Data Browsers (such as DashSearch LD [Goto 2013], Disco, Explorator [de Araújo 2009], /facet [Hildebrand 2006], gFacet [Heim 2010a], Information Workbench (IWB) [Haase 2011]) are the first kind of tools that have been implemented for the exploration and analysis of Linked datasets, they are quite simple browsers that provide the functionality for link navigation and representation of resources and their properties; they mainly use tabular views and links to provide navigation over Linked Data.
Linked Data Exploration Systems (such as Rhizomer [Brunetti 2012], LODWheel [Stuhr 2011], SemLens [Heim 2011], Payola [Klímek 2013], LDVizWiz [Atemezing 2014], VisWizard [Tschinkel 2014], LinkDaViz [Thellmann 2015], ViCoMap [Ristoski 2015]) support different types of data (for example, numbers, temporal, graphical, spatial) and provide different types of visualization. Some systems offer recommendation mechanisms suggesting the most suitable form of visualization depending on the input data (LinkDaViz, VisWizard, LDVizWiz). With regard to visual scalability, most systems do not adopt approximation techniques such as sampling, filtering or aggregation.
Linked Data Graph Tools (such as FlexViz web applications [Falconer 2010], RelFinder [Heim 2010], LODWheel [Stuhr 2011], Lodlive [Camarda 2012], LODeX [Benedetti 2014], [Benedetti 2016], VOWL 2 [Lohmann 2015] , graphVizdb [Bikakis 2016]) take advantage of the graphical structure of the RDF data model. These systems visualize linked datasets adopting a graph-based (a.k.a., node-link) approach.
Scalability issues: Most of the existing approaches assume that all objects can be presented on the screen and managed through traditional visualization techniques, thus limiting their applicability to data sets of limited size. Although several systems offer sampling or aggregation mechanisms, most of these load the entire graph into central memory and because graph layout algorithms require a lot of memory to draw large graphs, current systems are limited to handling small graphs. Some exceptions in this scenario are the cases of SynopsViz [Bikakis 2017] and VizBoard [Voigt 2012] which exploit external memory at runtime.
In order to handle large graphs, modern systems should adopt more sophisticated techniques such as hierarchical aggregation approaches in which the graph is recursively decomposed into smaller subgroups (using clustering and partitioning techniques), forming a hierarchy of levels of abstraction [Archambault 2007, Auber 2004, Tong 2013, Li 2015]; edge grouping techniques that aggregate the edges of the graph into bundles [Cui 2008, Gansner 2011] and also consider scalability and performance as key requirements and deepen disk-based implementations, as in [Tong 2006, Sundara 2010].
In the hands-on-session, we will get the participants to carry out exploratory searches of growing complexity by using the described tools and to create personalized visualization on a subset of data extracted from a LOD dataset.
We introduce the main existing Linked Data tools for visualization navigation and exploration. We use online tools and test their main features starting from simple example to complex visualization scenario. We accomplish how representation of data is helpful, not only for unskilled users, but also for experts in case of complex and big datasets. We experience how visualization and aggregation means are beneficial in scenarios where huge amount of data is accessed.
The LOD visualization tools available online that will be explored and used during the hands-on-session are:
This last session is devoted to collect feedbacks, questions and provide answers and to recap the scope of the tutorial.
We discuss and identifying open issues and new/emerging research directions where there is a need for greater research efforts.
Some basic knowledge of Linked Data, that is, of Uniform Resource Identifiers (URIs), the Hypertext Transfer Protocol (HTTP), the Resource Description Framework (RDF), RDF Schema.
Knowledge of the SPARQL Protocol and SPARQL Query Language is a clear plus, but it is not mandatory for the tutorial.
The aim of this tutorial is two-fold. First, we aim to provide the participants with an overview of the state of the art in visualization/exploration/navigation tools for big LOD and issues related with representation of big size LOD. In particular, we will focus on the design, key features, and will highlight the main shortcomings of these tools.
In addition, we aim to provide the participants with practical insights and hands-on experience that will allow them to use these applications as well as to select the right tools or features for their purposes or even improve upon existing solutions in their future research.
After completing our tutorial, the participants are able to:
- explain the main issues concerning the representation and visualization of LOD of big size;
- use several tools for accessing a (previously unknown) LOD dataset;
- select appropriate tool to navigate and explore a LOD dataset;
- create personalized visualization on a subset of data extracted from a LOD dataset;
- outline the main challenges and open issues in the scenario of Big Linked Data visualization.