paper(s)

For the reading club next Monday 26 October I propose the following paper:

Web-Scale Querying through Linked Data Fragments by Ruben Verborgh, Miel Vander Sande, Pieter Colpaert, Sam Coppens, Erik Mannens and Rik Van de Walle.

Abstract: To unlock the full potential of Linked Data sources, we need flexible ways to query them. Public sparql endpoints aim to fulfill that need, but their availability is notoriously problematic. We therefore introduce Linked Data Fragments, a publishing method that allows efficient offloading of query execution from servers to clients through a lightweight partitioning strategy. It enables servers to maintain availability rates as high as any regular http server, allowing querying to scale reliably to much larger numbers of clients. This paper explains the core concepts behind Linked Data Fragments and experimentally verifies their Web-level scalability, at the cost of increased query times. We show how trading server-side query execution for inexpensive data resources with relevant affordances enables a new generation of intelligent clients.

Notes / Discussion summary / Action points:

- The paper describes Linked Data Fragments as an alternative way of retrieving Linked Data from the web using querying, while partly distributing the querying process to the client. This is an interesting idea, however I would like to discuss if it is possible to integrate this into existing Linked Data access strategies instead of introducing another API based interface.
- In the paper, an analogy is made between searching information on the regular web (documents) and within Linked Data (section dereferencing). Do you this this comparison holds? If not, why not and what would be a better analogy?
- Staying with analogies, what do you think of the general approach of LDF? Isn’t the low availability of SPARQL endpoints a more generic problem that has to do with aspects such as the matureness of implementations, redundancy of servers and in the end, funding? E.g. in analogy, take web search which is also not partly implemented on the client side but works well, though the web of documents is still significantly bigger than the web of data.

---

For the reading club next Monday 19 October September I propose the following paper:

Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications

by Tim Clark, Paolo N Ciccarese and Carole A Goble.

Micropublications are web-friendly and machine-tractable models of scientific publications, which enable representing scientific claims together with the corresponding evidence (dataand citations).

... 33 pages: you can read the workshop paper ...

Towards Linked Open Data enabled Data Mining: Strategies for Feature Generation, Propositionalization, Selection, and Consolidation

http://ub-madoc.bib.uni-mannheim.de/38852/1/Petar_Ristoski_-_PhD_Symposium_ESWC_2015.pdf

Why linked data is not enough for scientists http://www.sciencedirect.com/science/article/pii/S0167739X11001439 It describes an additional Linked Data layer (research objects) which allows to bundle different aspects of the scientific process (data, method, attribution, publication). This paper probably fits better with the research interests of Martine/Dena/Tobias, but why I am interested in reading it: I am exploring ways of supporting Digital Humanities researchers in doing research on the Rijksmuseum dataset. The paper is from 2011, published in the Future Generation Computer Systems journal and one of the things I am curious about is how much of it is realised in the mean time. Further discussion points that came up while reading it: - Would the proposed Research Objects approach be applicable to your domain? - Is this one of many best practices? - Is the practice widely adopted? - Are subproblems solved by others (e.g. prov vocab)? - Are there applications supporting researchers in creating these bundles? - Did you ever try reusing data/methods? - How would you address versioning of graphs? Document with the overview of the upcoming papers: https://docs.google.com/document/d/1VIGX88GunenRCPe4fQslcS7hSrW5TU2cejDfTcmc1uE/edit

----

Ranking Buildings and Mining the Web for Popular Architectural Patterns

In this paper crowdsourcing, social media, linked open data and machine learning are combined in order to find influential architectural factors for buildings. I think this is a useful combination and unique topic, but which is applicable to other domains.

...

For this paper, there seem to be some (at least superficially) considerable differences between the author's version of the paper (Martine's link) and the official version provided by ACM: http://dl.acm.org/citation.cfm?id=2001297

----

For the reading club next Monday 14 September I propose the following topic "publishing negative results".

I think it is widely agreed that publishing negative results may provide valuable information and may benefit scientific progress.

In the discussion on Monday I would like to focus on how to actually write a publication presenting negative findings.

I suggest the following paper: Don't turn social media into another "literary digest" poll

In this paper the author aims to provide a balanced view of the actual possibilities of social media analytics. In order to do that,

he repeats (in a much more detailed way) previous studies on electoral prediction from Twitter data. In contrast to the previous studies,

his results show that the results for the 2008 U.S. Presidential Elections could not have been predicted from Twitter data by using

commonly applied methods.

In an opinion paper the author discusses how he came to write this paper, and how hard it was to to get it published.

As a preparation for Monday, I would like you to think about the following questions:

1) Your own research:

Did you ever obtain negative results that support your null hypothesis, or did not fit with the current scientific thinking?

Did you publish them, and how did you do that?

2) Ways of presenting:

How would you present your negative findings in an attractive way? The suggested paper shows one possible format,

i.e., by redoing previous studies and comparing the results. Could you think of other ways, or do you have examples of such papers?

(see also the google document on the reading club with the schedule and notes)