Social Media delivers its users a large-scale and easy-use platform, which cannot be delivered using traditional media (such as newspapers and television) to communicate and to socialize. This platform is based on technological foundations of Web 2.0, which defines collaboration and data sharing among Internet users and defined as a group of software that allows sharing of user-generated content.
Social Media users face two important problems when using this platform: The first problem is the following: When the Social Media users receive a data (user-generated content) via Social Media software, they might not know exactly the quality of the data. Therefore, they may not be sure about the reliability and correctness of the data and how much emphasis should be given. And they may help the dissemination of the data. As a result, situations like information pollution could arise. The second problem is the following: Social Media software may change their privacy policies over time. In addition to this, users may not be able to set their privacy settings precisely according to the privacy measures that they demand. These policies determine copyrights of the user’s shared data. User’s data, which are intended to be disseminated among friend circle, may be disseminated via resharing within the social media. Users are not aware of who actually can see his/her data or apply a process on it. As a result, problems like violation of copyrights can arise.
In order to solve the aforementioned two problems, users need information on the life cycle of the social media data. Provenance is defined as metadata that describes the origin, validity, quality and ownership of the data. Nowadays, we observe the lack of methodologies for detecting of information pollution and violation of copyrights of users’ shared data.
The goal of this project is to develop methodologies that collects, stores, poses queries and conducts analysis on the provenance of Social Media with a focus on developing of algorithms and methods for detecting information pollution and violation of copyrights of shared data. The project has sub-goals as well. One of them is to investigate how to improve existing popularity based ranking algorithms by utilizing provenance data. The other is to investigate, how to design and develop algorithms for converting distributed provenance graphs to a small-scale representation without information loss so that it could easily be mined for useful information.Throughout this project, the privacy of personal information will not be violated by collecting and storing data that is private to users. Only the events that take place during the life cycle of the data will be recorded.
The novel contributions of this project are two-fold: The former is the computing of the quality of social data and developing algorithms and methodologies that utilize this information to detect information pollution. The latter is to develop algorithms and methodologies that detect violation of copyrights of the users’ shared data. In addition, investigating how to improve existing popularity based ranking algorithms by utilizing provenance data and how to design and develop algorithms for converting distributed provenance graphs to a small-scale representation without information loss, are novel contributions as well.
In order to reach our goals, we summarize the different stages in our research method as followings:
This project proposal mainly aims to develop methodologies, which track provenance in social media software and provides facilities to analyze it with the focus of detecting information pollution and violation of copyrights.