The Scientific Oeuvre

This paper was started in 2008, in Palo Alto, CA, USA. Last updated on Jan 25 2012, Montreal, Qc, Canada
The history of this paper is recorded.

Abstract


This article is about management and communication of scientific knowledge in the information age, for a science increasingly social. I put forward the concept of scientific oeuvre, which, in my opinion, is the best alternative to the obsolete scientific paper. 

The ideas put forward here might not be sufficient and the solutions presented might seem inadequate for some readers. My primary intention is to open a dialog, to stimulate a different thinking, and to incite participation in projects for adapting formal knowledge to our new reality. This article can also be seen as a brief description of some type of Internet-based tool for management and publication of scientific information and knowledge, which could be implemented by some computer wizards somewhere, and later improved by others. 

Sharing and collaboration, as well as free access to all scientific knowledge are values that make the ethical framework of this article. I strongly believe that these values have positive consequences in terms of economics (in the general sense of the word economics) within a strongly interconnected society, but I leave the argumentation for another day. My point is that in the context of our modern society, the solution to the information crisis seems to be aligned with these values, which, in turn, engender positive outcomes in terms of production of new knowledge, and its embodiment into new technologies.    

The actual situation: the information crisis

We've been told that our society is going through an information crisis. In science, this crisis is even more exacerbated and it is felt not only by scientists, but also by technologists. The problems are not spawned by scarcity, they stem from abundance and diversity. 

Behind the massive scale of the scientific literature and its dynamism, which are two facets of the information crisis, lies a very complex and ever growing scientific community. Some problems are social in character, others relate to the structure of our scientific institutions, as well as to the allocation of resources, and to incentive mechanisms. Let's analyze them one by one. 

Nowadays, a huge amount of scientific data and information is generated by millions of scientists, from hundreds of different disciplines. It has become impossible for a single person to acquire all knowledge from a traditional discipline like physics, or mathematics. This aspect of the information crisis is referred to by the term information overload

Moreover, within the last two centuries, we have seen the creation of a multitude of scientific disciplines, followed by the segmentation of these disciplines into sub-domains. We can see this as the emergence of  subcultures, with their own technical language, their specialized ritual-like communications events, their religion-like paradigms, their gurus, their own literature, their specific practices, their organization and bureaucracy, etc. The situation is such that information hardly passes between academic disciplines. Scientists find themselves trapped into their own "tribes", so captivated by their rituals, their politics, and their economy, that they rarely meet with their colleagues from other departments. Ask a college professor whom you know to produce a list of names of colleagues from his University, and another one with names of scientists from his own research field, and compare these two lists. In medicine, for instance, these widening divisions cause very acute problems, for doctors must be able to blend information from chemistry, biology, and physics, and to integrate it with knowledge from their own practice. Let's call this aspect of the information crisis the Babel problem. One natural reaction to the Babel problem was the creation of interdisciplinary fields. 

A related problem was generated by the rapid development and implementation of information technologies. Different standards for encoding, transmitting, and presenting content were adopted, making it difficult for automatic engines to extract data and information, and to perform analysis. Let's call this the formatting problem. If the growth had been slower and more controlled, a standard solution might have been crystallized, though probably not the best one. The obvious reaction was a movement towards standardization and interoperability, which goes against traditional business interests that normally prefer proprietary technologies, in order to secure markets by locking in consumers.     

Power struggle within scientific communities and the fight for resources have a significant impact on the quality of the published scientific information, as well as on the dynamics of scientific research. As it is the case in any community with limited access to resources, in every research domain, we see the formation of centers of power/influence, which control the publication process through the peer review and editorial mechanisms, often enough for their own purposes, contrary to the spirit of science. I would call this the organizational bias problem. This reality is loudly whispered in University hallways, but it is rarely acknowledged in the media. 

There are also genuine mistakes that get in print unknowingly, and compromise the scientific literature. Once discovered by someone knowledgeable enough, a correction can be published as an erratum addition to the paper. But the original cannot be modified, and its erratum complement doesn't always show up with the paper when someone discovers it for the first time.  Moreover, numbers of scientific papers are proven to be non-accurate at a later time, based on better experimental results or on new theoretical developments. Again, the old paper will not be relegated to a "scientific paper museum", and the association between itself and its critics will remain elusive for the layman. This other aspect of the information crisis, related to the static character of the scientific paper, can be named the trust and quality problem.  The peer review process is put forward by the industry of scientific content publication as the guarantee of quality.  But the problem that  I just described is actual, despite the application of the peer review. The scientific method IS fallible, with time "bad science" gets mixed into the newly accepted corpus of scientific knowledge, which compromises the quality of formal scientific knowledge for its consumers, the technologists and others, which have no effective means to identify non-reliable information. Social and economical aspects of science make these matters even worse. 

The incentive mechanism that operates within the scientific community favors the number of scientific papers, rather than the completeness of the information delivered in a single document. This is the root of the segmentation problem. Specific scientific data, information, and knowledge are scattered in hundreds of small papers, with a high degree of redundancy. In order to follow somebody's work one must read and synthesize a number of articles, some of them very similar in content. The natural reaction to this problem was the review paper, which summarizes and structures information from different papers on a specific topic, produced by one, or many authors.  

In the 60's, private interests hijacked science by getting control of the flow of scientific information, for the sake of making profits, and maybe for other more obscure reasons. Nowadays, their role is clearly parasitic, with no real value added to the formal scientific information, charging the producer and the consumer, and setting prices that are intended to optimize their income, not to foster progress. These entities foreign to science keep knowledge from flowing freely within society. We can name this the private publisher effect. Today, in the Internet era, where the price of reproduction and distribution of information has practically gone through the floor, where the formating is done by the author, employing easy to use templates, where the quality of the content is insured by the authors themselves, there is no justification for the existence of these private publishers. A few serious movements to free scientific information were created, and some promising solutions have already been implemented. 

For every type of problem identified above there was a corrective reaction from the system. However, the structures of scientific institutions are too rigid to adapt fast enough to the ever-changing reality. From time to time, there comes a point where major structural changes must occur. Unfortunately, not everyone who has a stake in science works FOR science, or for society as a whole as a matter of fact. Some entities, individuals or institutions, oppose change that would be beneficial for the entire knowledge enterprise, for purposes that are not in line with scientific values. First, knowledge is power. Governments, as well as other powerful national and transnational private institutions need to control the process of knowledge development and its distribution. At this moment, they directly control the incentive mechanisms, important resources, as well as the most important information distribution channels. Second, it is possible to capitalize on the flow of information and knowledge. Although scientific publishers must be analyzed as an integrated component within the aforementioned web of power, we can say a few things about them, as if they were acting independently. Profit is their main motivation. But they don’t produce the scientific knowledge, they merely "package" and distribute it. We have to understand that the very existence of these organizations relies on distribution, which would be greatly affected by the change I propose here. It is normal to expect some resistance from their part, if their existence is threatened.   

The change must come from the creators of knowledge! They have the power to choose what to do with their creations, paid for by public funds. The problem is that these individuals are trapped within a web of necessities, which is created by the actual system. The status of a member of a scientific community relies on reputation, which is formally calculated according to a system of points, based on the number of papers published, on the rank of the journals where papers are published, and on the number of times the author is cited in other papers. With the reputation comes a job and research funds. Opposing the system is suicidal. 

Another impediment to change is the general misconception that the scientific publication, in its actual form, insures a higher quality of scientific information through the editorial and peer review processes in place, and that the Internet is like a haystack, an indiscernible pile of good and bad information. Nothing could be further from the truth! First, the Internet is messy, but it can be structured (see the section on web3.0). Second, there are many other examples of mechanisms of quality insurance and reputation attribution already operating on the Internet. Moreover, the peer review process is distinct from the publication process. Therefore, it can be implemented conjointly with any other publication scheme. 

The change proposed here is materially possible because of the Internet and all its applications. New possibilities were created that can fundamentally change the way we deal with information and knowledge. The storage of data and the processing of information and knowledge become very convenient and low cost. Easy to use tools for content creation have become ubiquitous. Standard formats have emerged. Web2.0 makes it possible to collaborate on the production and management of content, with no geographical constraints, at no costs. Furthermore, web3.0 and AI applications promise smart search and automatic structuring, analysis, and synthesis of scientific literature scattered across the web. It is now possible for scientific knowledge creators to take full control of the management and the publication process of their creation. There is no reason, other than political, for an intermediary entity to exist anymore. 
 
In the next section I will define the concept of scientific oeuvre

Another problem is the format. Information must be interlinked, not linear. 

Scientific oeuvre

The scientific oeuvre represents the lifework of a scientist, or his/her entire contribution to our scientific knowledge. The entire oeuvre is presented in a structured way. This structure is not linear, as in a book. It is modular and interlinked internally, as well as with external sources of information. Its content is a mixture of text, graphic, video, sound, etc. And it is semantically labeled.  

It is also a dynamic entity, in the sense that the author can, at any time, manage it (improve, enrich, augment it). And a historical entity, all past versions of it are stored, and can be retrieved. 

The author is the only person who can change the slightest thing in his/her oeuvre. More than one individual can be considered as an author, and a single individual can coauthor more than one scientific oeuvre. 

The scientific oeuvre “lives” on the Internet, not necessarily on a single  server, and is accessible by everyone, from everywhere, anytime. All scientific oeuvres are built on a unique platform in order to facilitate integration into the semantic web, and to enable automatic information processes. 

Knowledge has a strong individual component and, in some sense, the scientific oeuvre belongs to its author. However, knowledge cannot be produced in a vacuum and without resources. We all need to be grateful to our family, to our teachers, to the society we leave in, as well as to all humans, dead or alive, that have contributed directly or indirectly to our collective knowledge. For this reason, the rights of the author to his creation must be limited. Knowledge is NOT diminished if shared with others, unlike material things. (Yes, one can lose some advantage by sharing knowledge with his opponents, but in this case we are not in the realm of science anymore.) For this reason, we cannot apply the same economical treatment to information and knowledge as to material goods. The retribution for knowledge production must be structured differently and the wrights to the knowledge created must obey its own set of rules. The scientific oeuvre must be offered to all! Nevertheless, the author has the exclusive wright to modify its oeuvre.  This remark seems to be self-evident for academia, where scientists are bound by a contract with society at large. It can be contested in the case of a private enterprise, where the knowledge creator is bound by a contract with his employer. This later contract gets its legitimacy from stimulating the economy and social development, by stimulating private enterprise. I would gradually phase out this view. 

Some history

Before the 20th century, most of the scientific/philosophic publications took the form of a book. Authors were gathering a large amount of information and were presenting it in a coherent and structured way. Science was not a popular activity, and certainly not as dynamic as it is today. During the industrial revolution, science and technology became the very source of economic growth. The need to communicate scientific knowledge effectively and in real time became crucial. The printing technology was already there to respond to this growing need, and the era of the scientific paper in science communication began. Its role was to respond to a fast-paced scientific development, to the immense quantity of scientific data generated, and to the significant segmentation of science in a multitude of domains and sub-domains. There is no time to wait for Dr. Einstein to publish his book on E=mc^2. A scientific journal containing small and easily written papers, produced at a very low cost, distributes in a relatively short time important but small discoveries to the entire scientific community. Specialized scientific publications rapidly appeared for all existing research areas. Massive catalogs were also created to assist searching and recovering of papers from this vast scientific literature that was generated.

In the 60's privatization appeared in the field of the scientific publication. At this moment, the four largest for-profit companies account for 42% of articles published annually. These private companies are, in essence, parasitic. They add very little value to the  scientific publication and they actively oppose reforms in the scientific publication area that would benefit society. 

Presently, authorship recognition is maintained by a system of reference. The system of reference also serves the role of completing the information of a paper in a very compact way, by referring to other works covered by another paper, rather than rewriting everything at length. Quality insuring systems were also implemented to weed out the "bad science". 

Very large publishing organizations were quickly created, which covered wide scientific domains, and produced a large number of different scientific journals. Some of these organizations have also specialized in other types of media. For example, they organize live events or proceedings. Others specialize in science vulgarization, targeting individuals outside of the scientific community.

As technology evolved, the presentation (layout) improved and publication costs dropped. Publishing organizations began to offer a better product to a larger population. However, the greatest change was introduced by the advent of the Internet, following the development of computer and telecommunication technologies. In my opinion, this will ultimately put an end to the era of the scientific paper, sooner rather than later.

The Internet era, web1.0

We can look at the Internet as a repository and a source of information: People store information on different machines, and the Internet renders the sharing or the distribution possible across the planet. But in reality, web1.0 applications of the Internet are more than just that. A first generation of search engines was created to search, sort, and structure information no matter how various, or how scattered it is on the network. Specific scientific information is retrieved within seconds, without the need to know where it is actually located. We say “I found it on the Internet” which implicitly conveys the idea of non-locality. Moreover, the browser was invented to navigate the web,  to assist the normal user to perform search and retrieval processes, and to display information in a human readable format. Other important innovations were: the e-mail, the forum, and presentation tools like slide presentations, and tele- or video-conferences. They were rapidly implemented in the domain of scientific communication as e-mail-based scientific newsletters, scientific forums, and web-communications or conferences. 

As I mentioned earlier, technology has also greatly contributed to the presentation of scientific concepts and results, making possible the inclusion of complex graphical presentations, images, videos, and sound digital files. All these possibilities introduced by computer and Internet technologies have already been implemented in the field of science communication, but we are far from harvesting the full potential offered by this marvelous technology.

The scientific oeuvre concept, with its dynamic and historical character, can be implemented only with the first version of web applications, but its full potential cannot be realized without social tools and AI applications. 

The Internet era, web2.0 - the social web

Recently, we have witnessed the emergence and the explosion in popularity of wiki and social networking technologies. This phenomenon has taken the name of web 2.0. The fundamental idea behind these applications is that a user (the author) can create content and store it on the Internet. Apart from the fact that this content is sharable (a web1.0 possibility), it can also be modified, therefore, managed by the user anytime, from any geographical location. Using the same technology, and adding a set of rules of engagement, to the same content can be assigned many authors, with different degrees of decision-making power. In order to make these applications truly social, user-friendly interfaces were created. The result  is that a group of people are now able, using very intuitive tools, to coordinate their efforts to produce content, which in some case amounts to highly complex and complete bodies of scientific knowledge. In short, web2.0 is about the social aspects of science: collaboration, coordination, sharing, networking, community building, consensus building, democratization of scientific knowledge, deprofessionalization of knowledge activities, decentralization, etc. 

Web 2.0 has not fully penetrated the scientific community yet. Its full implementation means, in my opinion, the extinction of the scientific paper, and its replacement with the scientific oeuvre, or something like it. 

In this context, the scientific oeuvre can become a collaborative oeuvre. Moreover, it can be augmented with forums and discussions, and critical and positive comments can be dynamically associated with it. This can constitute the basis of a new and more democratic form of reputation attribution and peer review process. Web2.0 makes the scientific oeuvre more dynamic, because it integrates it into a real debate, in real-time. 

During the deployment of web2.0 applications there was also progress made in search engines, semantic tools, and clustering technologies that enable one to generate domains of scientific knowledge and gigantic open databases. This is the phase in of the next generation of web applications, web3.0. 

The Internet era, web3.0 - the semantic web


First, web3.0 applications will greatly improve the search of scientific data, information, and knowledge scattered across the web. There will be less constraints on how or where the content is stored on the network. Second, powerful applications will be available to dynamically structure this content. This will facilitate the creation of open databases. Third, semantic applications will automate analysis and synthesis. I am talking here about the idea browser that has to be created, which will replace the actual webpage browser.  

In order to extract this potential from the Internet technology, the scientific oeuvre must be built on a universal platform, using a unifying ontology; it must be semantically enriched. The infrastructure is getting built, we need to create the scientific oeuvre editor, a semi-automatic formating tool. 

Once implemented, it will become possible to dynamically create  very complete review works, with minimal input from a human operator. In the current state this cannot be achieved for different reasons: 
  • Different standards are used, and some information is lost in the formatting process.
  • Fragmentation: the information is fragmented into a large number of small publications.
  • Accuracy: a lot of publications contain proven non-valid scientific information and automatic agents can’t "know" that. 
  • Ontology: across domains, automatic interpretation is impossible because of incompatible ontologies underlaying the technical languages. 
  • Access: most of scientific publications are protected, and not free of charge.

Conclusion

The scientific paper is outdated. It is clear that computers and the Internet will play a bigger role in the creation, the processing, the management, and the distribution of scientific content. This is how I view the future. What do you think?   


Other things



Video: Intro to the Semantic Web

Other ideas

Talk about the automatic "semantification" of a piece of articles. Create a web-based tool for scientific publication, that will make it easier for scientists to format their publications, to capture knowledge. 

You can profesionalize a practice, but you cannot profesionalize the production of knowledge. 

Other people express similar ideas

Peter Murray-Rust

Peter Murray-Rust is Reader in Molecular Informatics at the University of Cambridge and Senior Research Fellow of Churchill College

See his Google presentation HERE, and the abstract of this presentation below.

ABSTRACT: The millions of scientific papers published each year are an amazing source for scientific discovery but in most of them the experimental data is destroyed by the publication process. Publishers insist on converting semantic data into PDF which effectively destroys everything. We have been developing social and technical strategies to preserve and liberate this data and where this has happened have been able to create completely new mashups and other semantic resources.
Chemistry is the most tractable discipline for the semantic web - most chemistry can be turned into XML with little semantic loss, using Chemical Markup Language and complementary MLs such as XHTML, MathML and SVG. 
We have to mobilise a bottom-up revolution through modern Internet ideas - blogs, communal source development, interoperability. We have done this in chemistry through the Blue Obelisk movement - an informal but coherent group of young-at-heart hackers. We are adopting lightweight web technologies ("REST", etc.) to chemistry - an example will be CMLRSS which we run in a Bioclipse environment.

 

Andrew Walkingshaw

Researcher at the Unilever Centre for Molecular Informatics. Watch his seminar online "Web 2.0 for Scientists - an introduction" on his blog, but also on Youtube.


See also the Internet peer-reviewed at Hypothes.is, a very important step forward in the making of the Scientific Oeuvre. 

Comments