4. METHOD

In the previous chapter, a conceptual model was created based on the theory review. In this model, the research impact of an academic book is influenced by the fact whether it is published under Open Access or not and the channel used to disseminate the title. Based on the conceptual model, an experiment is conducted using 400 titles published by Amsterdam University Press.

4.1 Amsterdam University Press

Amsterdam University Press (AUP) is an academic publisher, mainly of books in the field of humanities and social sciences. It also publishes several journals in paper and in electronic form. AUP’s mission is to support and stimulate academic knowledge by way of academic publications. The quality of the content of these publications is therefore most important. The four editorial advisory councils, consisting of Dutch and Belgian scholars, judge all publications based on their academic standards. Popular scholarly titles can also be found in AUP’s publishing list. AUP is owned by the University of Amsterdam and works on a not-for-profit basis (AUP, 2009).

AUP publishes around 150 books per year. Apart from books, AUP publishes several journals, some of which are published both on paper and online, some as an online Open Access e-journal. All books are published as a traditional paper book; most titles are also made available in an e-book version, sold through specialised vendors. The impact on the sales is as yet not very large: in 2008 the sales revenues of e-books amounted to 0.4% of the yearly total. In 2009 – where much emphasis had been placed on making more e-books available – this amount had risen to 1%. Furthermore, a selection of titles is also published in Open Access and made available through the AUP repository. In August 2009, approximately 25 percent of all books in print were available in Open Access. A large part of these titles are published in cooperation with research networks such as IMISCOE – an acronym for International Migration, Integration and Social Cohesion (IMISCOE, 2009).

All books are promoted and disseminated trough a number of channels: through the websites of AUP and its catalogue and by using the websites and databases of specialised national and international sales organisations as Centraal Boekhuis, Bol.com, Nielsen Book Data, Chicago University Press and Amazon.com. Furthermore, AUP makes extensive use of the Google Book Search program, in order to make the content of its books searchable on the internet. By default, a portion of all books – 10 percent – can be read using Google Book Search.

AUP is currently coordinating a project called OAPEN where several academic publishers work together to create a Open Access library in the field of humanities and social sciences; when the project is finished the lock-in effects will become visible and it is to be expected that other publishers will join (OAPEN, 2009).

4.2 Goal of the experiment and representativeness of AUP

The goal of this experiment is to understand the impact of Open Access publishing on the usage of scientific books. We have established a model where lower search costs – defined as discovery – lead to higher usage of books – which can be measured by online consultation, sales and citations. In this model, some constructs are under the control of AUP: accessibility and channel. Changes in accessibility and channel usage should lead to changes in sales and citations.

Amsterdam University Press offers a representative case, as it is an academic publisher with a specialisation on the field of humanities and social sciences and mainly publishes scientific books. Due to the fact that it publishes books for several decades and covers quite a large part of all humanities and social sciences, it has an extensive back catalogue. This allows for a selection of titles which is not restricted in subject or in publication year.

4.3 Setup of the experiment

The experiment consists of creating 4 equal sets of 100 titles. Each title is placed in one of four sets. The different sets are defined using two variables: accessibility and channel. Each set is disseminated using a specified channel and accessibility settings. For a period of 9 months, starting in April 2009, the effect on discovery, online consultation, sales and citations is measured. Discovery and online consultation are measured using the number of views and downloads from the respective channels. As all titles are accessible in the Google Book Search program, the usage statistics help to determine the interest by readers. Among the usage statistics are ‘Book visits’ – measuring the number of times the webpage of the book is accessed – and ‘page views’, which measure the number of pages opened. Furthermore, the AUP repository can deliver its own usage statistics: the number of times a record is accessed and the number of downloads of the digital books stored in the repository. Sales figures are provided by AUP and citations are measured using the Google Scholar search engine.

The division of titles can be summarised as follows:

- Set 1: Available ‘as usual’. An electronic version of almost all books by AUP is submitted to the Google Book Search website. By default, AUP allows a user of Google Book Search to can see only 10% of the book’s contents. The full content of each book is indexed by the Google service. The titles in this set are not uploaded into the AUP repository. The accessibility of this set is the lowest.

- Set 2: Freely available via the repository; visible for 10% in Google Book Search. The titles of this set are uploaded in the AUP repository. For each title, a record is created in the repository database containing metadata and an electronic version of the book. The ‘visibility settings’ of Google Book Search are not changed and remain at 10 %.

- Set 3: Visible for 100% in Google Book Search and freely available via the repository. The titles of this set are uploaded in the AUP repository, and the ‘visibility settings’ of Google Book Search are set to 100%. The titles in this set are fully accessible through both channels. The accessibility of this set is the highest.

- Set 4: Visible for 100% in Google Book Search; not available via the repository. For this set, the ‘visibility settings’ of Google Book Search are set to 100%. The books are not placed in the AUP repository.

4.3.1 What to measure?

For a period of 9 months, on a monthly basis, the following data will be collected for each title separately:

- Sales

- Number of citations

- If in a repository:

o Number of record views

o Number of document downloads

- In Google Book Search:

o Number of document views

o Number of page views

Several aspects of each title are relevant for this experiment, as a means to remove bias from the sets:

- Publication year

- Language

- Subject

- Print run

4.3.2 Selection of titles

On April 2009 a selection was made from 893 titles published by AUP which were in print at that time in order to create the four sets of titles, Each title is identified by its ISBN (International Standard Book Number). To create a useful selection, several selection criteria were used:

- In print and available. The titles must be in print and remain available through the period of the experiment. Titles that become out of print during the period of the experiment cannot be ordered. This would have a direct effect on sales figures, one of the aspects measured.

- Imprint. AUP uses several imprints, each with their own characteristics. Furthermore, AUP distributes titles from other publishers. As AUP does not possess the right to make these titles freely available, they are excluded from the experiment.

- Available in Google Book Search. AUP makes almost all its titles available for searching in the Google Book Search program. Some titles are not placed on the Google Book Search website, depending on certain circumstances. These titles are also excluded from the experiment.

- In print in 2008 or before. In the first three months of publication, sales of a title are influenced by several factors such as fixed sales or pre-orders. In order to avoid these effects, all selected titles are published before January 2009.

- Exclusion of the IMISCOE, ISIM, ICAS, IIAS series. The IMISCOE, ISIM, ICAS and IIAS series are already published in Open Access. Therefore these titles are excluded from the experiment, as the effects of making them available in Open Access cannot be measured.

Using these criteria a list of 412 ISBNs was created, containing 22 titles which were published both as a hardback and a paperback book. As this distinction is irrelevant in the digital domain, 11 ISBNs were removed from the list. Removing the oldest title – published in 1994 – completed the list.

4.4 Validation

In order to conduct a useful experiment, both the internal and external validity must be addressed (Campbell, Stanley & Gage, 1969). Internal validity comprises the minimal requirements for the experiment to be interpretable. If the experiment is not internally valid, the results may be caused by other occurrences than the experimental treatment. Whether the results are more general applicable is the question of external validity.

The first aspect of internal validity is history, described as the specific events occurring between measurements, other than the experimental variable. Here it is controlled by setting the accessibility of the sets and placing the documents in the repository in a very short time – two days. After the start, the accessibility settings and the dissemination through the channels is not changed for the duration of the experiment and any event – such as the holiday period in the universities – should affect all titles in the same way. Another measure used is the selection of titles with a publication date before January 2009. In the first three months of publication, sales figures are influenced by several factors such as fixed sales or pre-orders.

The second aspect is maturation, the processes operating as a function of the passage of time per se which are not specific to the particular events. As we experiment with books and their sales figures, the influence of publication date on sales comes into play as older titles usually are sold less. To counter this, books of the same publication year are distributed evenly over the sets. The authors describe testing, the effects of taking a test upon the scores of a second testing as the third aspect. As it is not relevant to our experiment, this aspect can be set aside. In contrast, instrumentation is an important aspect of the experiment. Instrumentation is defined as changes in the calibration of a measuring instrument or changes in the observers or scorers used may produce changes in the obtained measurements. The outcomes of the experiment depend on web statistics, provided sales figures and measured citations in the Google Scholar search engine. In order to overcome technical failures, the data will be collected on a monthly basis instead of accumulating all data in one session after the experimental period. If a problem arises, it will be confined to one month’s data.

The aspect of statistical regression is less relevant here. It refers tot test scores of groups that have been selected on the basis of their extreme scores. In this experiment, the readers are not tested; just some aspects of their ‘online behaviour’ are recorded. The last aspect of internal validity discussed here is experimental mortality, or differential loss of respondents from the comparison groups. In order to remedy this, all books used in the experiment remain in print for the full period if the experiment. This ensures that all titles are fully available.

External validity aspects include the interaction effects of selection biases and the experimental variable. If the sets were biased – for instance by putting all bestselling titles in one set – the results of the experiment would be unrepresentative. As discussed below, much effort has been put in creating ‘equal’ sets. Another threat to external validity is the reactive effects of experimental arrangements, where an artificial laboratory environment may have an effect on the experiment. Here it is remedied by the setup of the experiment. No information is given about the experiment and the readers of the 400 selected titles are not aware of the role they play during the testing period. Furthermore, there is no multiple-treatment interference, which is likely to occur whenever multiple treatments are applied to the same respondents. The accessibility settings and the dissemination channels for each book are set once and not altered for the period of the experiment.

4.4.1 Removal of bias

Considerable effort has been put in the removal of bias. This aspect – where the comparison groups are selected using several criteria – will be discussed in detail. As described before, the citation habits of different scientific fields may vary. The difference between scientific disciplines and the publication date may also influence the number of books sold. Sales expectations can be measured using the print run. The last aspect is language, which may influence discovery.

This experiment operates using four sets of books; therefore these sets must be as equal as possible. Each of the 400 books is compared using the following criteria: subject; type of work; language; expected sales. To address these criteria the series could be used. All titles within one series should contain the same properties: the same language, type of document, sales expectation and subject. This seemed a promising way to proceed, as more than 70% (283) of the selected titles are part of a series. Unfortunately, not all series are centred on a broad subject. For instance the series LUP Dissertaties, UvA Proefschriften, Amsterdam Academic Archive, LUP Academic or Atheneum Boekhandel Canon cover a wide range of subjects, while language and document type are similar. Other series like Film Culture in Transition, Changing Welfare States, Tekst in Context or Studies over Politieke Vernieuwing also share a main topic. In order to overcome this, all titles from a ‘subject based series’ were assigned the same subject codes.

In the database of AUP each title is assigned several subject codes describing the content. In order to create relative large groups with the same subject, the number of subject codes was reduced. The same principle was applied to the expected sales. In order to measure this, the print run of a title was used. While each individual title may have a different print run – from 0 for Print on Demand titles to 6500 – an amount rounded up to the next 500 was used. This created again relative large groups, which could be evenly divided over the four sets.