PRONALIZ

Project Description

Social Media delivers its users a large-scale and easy-use platform, which cannot be delivered using traditional media (such as newspapers and television) to communicate and to socialize. This platform is based on technological foundations of Web 2.0, which defines collaboration and data sharing among Internet users and defined as a group of software that allows sharing of user-generated content.

Social Media users face two important problems when using this platform: The first problem is the following: When the Social Media users receive a data (user-generated content) via Social Media software, they might not know exactly the quality of the data. Therefore, they may not be sure about the reliability and correctness of the data and how much emphasis should be given. And they may help the dissemination of the data. As a result, situations like information pollution could arise. The second problem is the following: Social Media software may change their privacy policies over time. In addition to this, users may not be able to set their privacy settings precisely according to the privacy measures that they demand. These policies determine copyrights of the user’s shared data. User’s data, which are intended to be disseminated among friend circle, may be disseminated via resharing within the social media. Users are not aware of who actually can see his/her data or apply a process on it. As a result, problems like violation of copyrights can arise.

In order to solve the aforementioned two problems, users need information on the life cycle of the social media data. Provenance is defined as metadata that describes the origin, validity, quality and ownership of the data. Nowadays, we observe the lack of methodologies for detecting of information pollution and violation of copyrights of users’ shared data.

The goal of this project is to develop methodologies that collects, stores, poses queries and conducts analysis on the provenance of Social Media with a focus on developing of algorithms and methods for detecting information pollution and violation of copyrights of shared data. The project has sub-goals as well. One of them is to investigate how to improve existing popularity based ranking algorithms by utilizing provenance data. The other is to investigate, how to design and develop algorithms for converting distributed provenance graphs to a small-scale representation without information loss so that it could easily be mined for useful information.Throughout this project, the privacy of personal information will not be violated by collecting and storing data that is private to users. Only the events that take place during the life cycle of the data will be recorded.

The novel contributions of this project are two-fold: The former is the computing of the quality of social data and developing algorithms and methodologies that utilize this information to detect information pollution. The latter is to develop algorithms and methodologies that detect violation of copyrights of the users’ shared data. In addition, investigating how to improve existing popularity based ranking algorithms by utilizing provenance data and how to design and develop algorithms for converting distributed provenance graphs to a small-scale representation without information loss, are novel contributions as well.

In order to reach our goals, we summarize the different stages in our research method as followings:

During the conceptual studies, the scope of the social media provenance; requirements of the project and the usage scenarios, will be determined.
The type of provenance description language will be decided based on usability, scalability and performance principles. At this stage, the data model for the provenance will also be determined and implemented.
A large scale provenance database for social media will be constructed based on the predefined usage scenarios.
We will design and develop algorithms for computing data quality and use this information to detect information pollution and violations of copyrights by utilizing provenance.
We will use provenance to improve the success rate of ranking algorithms and convert provenance graphs to a consolidated data representation.
We will design and develop methodologies which collect, stores and poses queries on provenance. We will implement this methodology and then use it in order to collect provenance from social media software.
We will test the proposed algorithms and methodologies on the real social media data; perform optimizations if needed, and report results.

This project proposal mainly aims to develop methodologies, which track provenance in social media software and provides facilities to analyze it with the focus of detecting information pollution and violation of copyrights.

Google Sites

Report abuse