Read Semantic Web
1) Applying the Semantic Web
Most Semantic Web practitioners try to tackle two primary types of problems:
The first approach to applying Semantic Web technologies is to use them for the purposes of Artificial Intelligence.
The second approach is to use Semantic Web technologies for flexible, transparent data management.
The first approach
Researchers are using Semantic Web technologies to enable machines to infer new facts from existing facts and data.
Where this kind of capability is most compelling is in very complicated research scenarios involving huge amounts of data at a scale impractical or impossible for humans truly to comprehend. Although some specific, targeted applications of this approach to Semantic Web technologies do exist, they have yet to factor into mainstream culture.
The second approach
For many applications, key characteristics provide Semantic Web technologies with a significant advantage—in terms of time, cost, and maintainability—over traditional technologies such as relational databases.
The fundamental difference at play here is that the Semantic Web starts with the Open World Assumption (OWA). The OWA is the assumption that we don't know all the facts and specifically states that just because we don't know something doesn't mean it's not true.
In practice, this means that it must necessarily be easy for Semantic Web systems to incorporate new facts as needed, including new kinds of data not anticipated at the beginning.
Flexibility means that new, unanticipated sources can often be integrated on the fly.
Semantic Web technologies and existing technologies are very much compatible; in fact, most Semantic Web applications begin with an existing relational database!
Yet, challenges remain to leveraging all the possibilities of Hadoop, especially as it relates to empowering the data scientist. Hadoop is composed of two sub-projects: HDFS, a distributed file system built on a cluster of commodity hardware so that data stored in any node can be shared across all the servers, and the MapReduce framework for processing the data stored in those files.
What might be holding companies back from really exploiting Hadoop to such ends are factors like there being little to no data management capabilities. “It really mainly is a place where you can store files. As far as understanding what data is in what files, how it is related to other data sets…there is not a lot of capability within the core Hadoop,” he said.
It’s not only difficult to understand the metadata around data sets, but just understanding what the schema of a data set is is difficult, he said. “That’s something we take for granted in the relational database world because it’s always there.”
Additionally, MapReduce doesn’t work for all kinds of data processing a company might want to do, around transformations and analysis, for instance. “If you have to repeat something you want it in a place where people can find, reuse, and share it with each other. If you apply a transformation to a data set you want to know what happened,” he said. Data integration and provenance also are concerns.
Where semantics fits in the world of Hadoop, he pointed out, “is helping you prepare data, work with data and integrate it for some very specific analytic use case…or sometimes you just want a table of results and that is your analysis, and you can just pull that out of Hadoop,” he explained.