5. FINDINGS AND RESULTS

1 Findings and results

The experiment’s data was analysed using ANOVA (analysis of variance) in order to test the various hypotheses. Using ANOVA, the effect of the independent variable on the dependant variable is measured. For each hypothesis, the dependant and independent variables are listed in combination to the results of the test. As a second analysis, several other variables are included in order to test their influence on the outcome, or in other words: the between subject effects. This is called ANCOVA (analysis of covariance). Furthermore, the influence of title properties on the dependant variables is assessed using multiple regression analysis.

5.1 Hypothesis 1: The discovery of fully accessible titles is significantly higher, compared to titles which are not fully accessible.

The dependant variable is discovery, which is measured as the number of monthly Book visits a title received during the period from April 2009 to December 2009 in the Google Book search program. The independent variable is accessibility, which is measured as the set to which a title belongs. The variances of the independent variable were found to be significantly different; therefore the values of the Welch robust test of equality of means are reported here. There was a significant effect of accessibility on discovery, F(3, 1808) = 7.164, p < .01, ω = .07.

Planned contrasts revealed that the number of monthly Book visits a title received was increased by being disseminated through an Open Access channel (books in Set 2, Set 3 or Set 4), compared to titles which were not (books in Set 1), t(2117) = 1.99, p < .05 (one tailed), r = .04.

As the assumption of homogeneity of variance has not been met, when carrying out post-hoc tests, the Games-Howell procedure must be used. The multiple comparisons carried out by the Games-Howell test appear to show that there is a significant difference in means between Set 2 and Set 3, Set 2 and Set 4. In other words, the post-hoc test revealed a difference in Book visits between the titles disseminated through the AUP repository and titles disseminated through the Google Book Search program or through the AUP repository combined with the Google Book Search program.

Table 7.1 Hypothesis 1: Descriptive statistics of Book visits

5.2 Hypothesis 2: The online consultation (e.g. pages read or number of downloads) of fully accessible titles is significantly higher, compared to titles which are not fully accessible.

The dependant variable is online consultation, which is measured as the number of monthly page views a title received in the Google Book search program during the period from April 2009 to December 2009. Here the values from the AUP repository are not used, as they only apply to Set 2 and Set 3. The independent variable is accessibility, which is measured as the set to which a title belongs. The variances of the independent variable were found to be significantly different; therefore the values of the Welch robust test of equality of means are reported here. There was a significant effect of accessibility on online consultation, F(3, 1894) = 37.705, p < .01, ω = .18.

Planned contrasts revealed that the number of monthly page views a title received was increased by being disseminated through an Open Access channel (books in Set 2, Set 3 or Set 4), compared to titles which were not (books in Set 1), t(3420) = 4.92, p < .01 (one tailed), r = .08.

As the assumption of homogeneity of variance has not been met, when carrying out post-hoc tests, the Games-Howell procedure must be used. The multiple comparisons carried out by the Games-Howell test appear to show that there is a significant difference in means between Set 1 and Set 3, Set 1 and Set 4, Set 2 and Set 3, Set 2 and Set 4. In other words, the post-hoc test revealed a difference in page views between the control group and the titles disseminated through the AUP repository and titles disseminated through the Google Book Search program or through the AUP repository combined with the Google Book Search program.

Table 7.2 Hypothesis 2: Descriptive statistics of page views

5.3 Hypothesis 3: The citation rate of fully accessible titles is significantly higher, compared to titles which are not fully accessible.

The dependant variable is citation rate, which is measured as the difference in citation rate per title during the period from April 2009 to December 2009 as found in the Google Scholar search engine. The independent variable is accessibility, which is measured as the set to which a title belongs. The variances of the independent variable were not significantly different, and there was no significant effect of accessibility on citation, F(3, 396) = .785, p < .51, ω = .99.

Planned contrasts revealed that the number of citations a title received was not increased or decreased by being disseminated through an Open Access channel (books in Set 2, Set 3 or Set 4), compared to titles which were not (books in Set 1), t(396) = .99, p < .17 (one tailed), r = .05, and that the number of citations was not being increased or decreased by being disseminated through both the AUP repository and the Google Book Search program (books in Set 3), compared to titles which were disseminated through one of those channels (books in Set 2 and 4), t(396) = 1.12, p < .14, r = .06.

The assumption of homogeneity of variance has been met; therefore the Tukey and REGWQ post-hoc tests have been used. The multiple comparisons of both tests appear to show that there is no significant difference in means between any of the sets.

Table 7.3 Hypothesis 3: Descriptive statistics of citations

5.4 Hypothesis 4: The sales figures of fully accessible titles are significantly higher, compared to titles which are not fully accessible.

The dependant variable is sales figures, which is measured as the monthly number of sales per title during the period from April 2009 to December 2009. The independent variable is accessibility, which is measured as the set to which a title belongs. From the collected data the following was omitted:

Table 7.4 Hypothesis 4: Omitted sales figures

These were unusually high monthly sales, caused by remaindering of those titles. When a title is remaindered, the complete stock is sold to a specialised vendor, causing a major effect on the sales figures.

The variances of the independent variable were not significantly different, and there was no significant effect of accessibility on sales, F(3, 396) = .554, p < .65, ω = .99.

Planned contrasts revealed that the number of sales a title received was not increased or decreased by being disseminated through an Open Access channel (books in Set 2, Set 3 or Set 4), compared to titles which were not (books in Set 1), t(396) = 1.21, p < .12 (one tailed), r = .06.

Table 7.5 Hypothesis 4: Descriptive statistics of sales

5.5 Hypothesis 5: The discovery of titles disseminated through both the institutional repository and the Google Book Search program is significantly higher, compared to titles disseminated through one of those channels.

The dependant variable is discovery, which is measured as the number of monthly Book visits a title received in the Google Book search program during the period April 2009 to December 2009. The independent variable is accessibility through a single channel or through multiple channels, which is measured as the set to which a title belongs. Titles in Set 2 and Set 4 are disseminated through one channel; titles in Set 3 are disseminated through multiple channels.

Testing Hypothesis 5 is an extension of the tests performed for Hypothesis 1. For Hypothesis 5, the focus is not on all titles in Open Access channels – the books in Set 2, Set 3 and Set 4 – versus the books in Set 1, but on the differences in discovery of the books in Set 2 and Set 4 compared to the discovery of titles in Set 3. In order to find these results, a planned contrast test is performed. Furthermore, as the descriptive statistics are the same as for Hypothesis 1, they are not repeated here.

Planned contrasts revealed that the number of monthly Book visits was not increased by being disseminated through both the AUP repository and the Google Book Search program (books in Set 3), compared to titles which were disseminated through one of those channels (books in Set 2 and Set 4), t(1816) = -.37, p < .36, r = .01. In this case, dissemination through multiple channels does not have a significant effect on discovery.

5.6 Hypothesis 6: The online consultation (e.g. pages read or number of downloads) of titles disseminated through both the institutional repository and the Google Book Search program is significantly higher, compared to titles disseminated through one of those channels.

In order to test Hypothesis 6, several dependant variables were used. The first variable to be used is the number of monthly page views a title received in the Google Book search program, which is an extension of the tests performed for Hypothesis 2. Similar to Hypothesis 1 and Hypothesis 5, the focus is not on all titles in Open Access channels – the books in Set 2, Set 3 and Set 4 – versus the books in Set 1, but on the differences in online consultation of the books in Set 2 and Set 4 compared to the online consultation of titles in Set 3. In order to find these results, a planned contrast test is performed. Furthermore, as the descriptive statistics are the same as for Hypothesis 2, they are not repeated here.

The second dependant variable is the number of monthly page views a title received in the AUP repository. This only applies to books in Set 2 and Set 3. The third dependant variable is the number of monthly downloads a title received in the AUP repository. Of course, this is only measured for titles in Set 2 and Set 3.

5.6.1 Dependant variable: page views in the Google Book Search program

The dependant variable is online consultation, which is measured as the number of monthly page views a title received in the Google Book search program during the period April 2009 to December 2009. The independent variable is accessibility through a single channel or through multiple channels, which is measured as the set to which a title belongs. Titles in Set 2 and Set 4 are disseminated through one channel; titles in Set 3 are disseminated through multiple channels.

Planned contrasts revealed that the number of monthly page views was decreased by being disseminated through both the AUP repository and the Google Book Search program (books in Set 3), compared to titles which were disseminated through one of those channels (books in Set 2 and 4), t(1941) = -2.85, p < .05, r = .06. In this case, dissemination through multiple channels does have a significant negative effect on online consultation, which is most likely caused by the very low number of online consultations through the AUP repository, compared with the number of online consultations of titles through the Google Book Search program.

5.6.2 Dependant variable: page views in the AUP repository

The dependant variable is online consultation, which is measured as the number of monthly page views during the period April 2009 to December 2009 a title received in the AUP repository. The independent variable is accessibility through a single channel or through multiple channels, which is measured as the set to which a title belongs to. Titles in Set 2 are disseminated through one channel; titles in Set 3 are disseminated through multiple channels.

The variances of the independent variable were not significantly different, and dissemination through multiple channels does not have a significant effect on page views in the AUP repository, F(1, 468) = .333, p < .57, ω = .99.

Planned contrasts or post-hoc tests were not performed because there are fewer than three groups.

Table 7.6 Hypothesis 6: Descriptive statistics of repository page views

5.6.3 Dependant variable: downloads from the AUP repository

The dependant variable is online consultation, which is measured as the number of monthly downloads a title received in the AUP repository during the period from April 2009 to December 2009. The independent variable is accessibility through a single channel or through multiple channels, which is measured as the set to which a title belongs. Titles in Set 2 are disseminated through one channel; titles in Set 3 are disseminated through multiple channels.

The variances of the independent variable were not significantly different, and dissemination through multiple channels does not have a significant effect on monthly downloads from the AUP repository, F(1, 468) = .209, p < .65, ω = .99.

Planned contrasts or post-hoc tests were not performed because there are fewer than three groups.

Table 7.7 Hypothesis 6: Descriptive statistics of repository downloads

5.7 Hypothesis 7: The sales figures of titles disseminated through both the institutional repository and the Google Book Search program is significantly higher, compared to titles disseminated through one of those channels.

Testing Hypothesis 7 is an extension of the tests performed for Hypothesis 4. As we have seen for Hypothesis 1 paired to Hypothesis 5 and Hypothesis 2 paired to Hypothesis 6, the focus is not on all titles in Open Access channels – the books in Set 2, Set 3 and Set 4 – versus the books in Set 1, but on the differences in the dependant variable of the books in Set 2 and Set 4 compared to Set 3. In order to find these results, a planned contrast test is performed. Furthermore, as the descriptive statistics are the same as for Hypothesis 4, they are not repeated here.

Table 7.8 Hypothesis 7: Omitted sales figures

Planned contrasts revealed that the number of sales was not being increased or decreased by being disseminated through both the AUP repository and the Google Book Search program (books in Set 3), compared to titles which were disseminated through one of those channels (books in Set 2 and 4), t(396) = .36, p < .37, r = .02.

5.8 Analysis of covariance

The previous paragraphs described the hypotheses, where one dependant variable – discovery, citations, online consultation and sales – was tested against accessibility. In the following paragraphs, a test is performed to find if other variables – the covariables – have an effect on the dependant variable. For instance, if discovery – measured as the number of Book visits – is the dependant variable, the covariables tested will be online usage – measured as the number of page views in the Google Book program – combined with sales and citations. Measures of online usage tied to the AUP repository are not used here, as they are only relevant for books in Set 2 and Set 3.

ANCOVA tests are based on two assumptions: firstly, independence of the covariate and treatment effect and secondly, homogeneity of regression slopes. Independence of the covariate and treatment effect means that the effects of the covariate do not ‘interfere’ with the effects caused by the dependant variable. This can be tested using a customized ANCOVA model. For instance, choosing the number of repository downloads as a covariate would be a violation of this assumption, as it is closely related to the dependant variable: the dissemination channels used. The second assumption can be tested statistically using Levene’s test of equality of error variances, combined with a test on the ratio – or critical value – between the highest and lowest variances, called Hartley’s test. The critical value associated with datasets containing more than 60 items is 1.00. In order to meet the assumption of homogeneity of regression slopes, the found critical value must be lower (Field, 2009).

5.9 Discovery: ANCOVA

The dependant variable is discovery, which is measured as the number of monthly Book visits a title received in the Google Book search program during the period April 2009 to December 2009. The independent variable is accessibility, which is measured as the set to which a title belongs. The covariates used are page views, citations and sales.

While the assumption of independence of the covariate and treatment effect has been met, the assumption of homogeneity of regression slopes was violated. Therefore, no results will be reported.

Table 7.9 Discovery: Between-Subjects effects

R² = .791 (Adjusted R² = .785)

5.10 Online consultation: ANCOVA

The dependant variable is online consultation, which is measured as the number of monthly page views a title received in the Google Book search program during the period April 2009 to December 2009. The independent variable is accessibility, which is measured as the set to which a title belongs. The covariates used are Book visits, citations and sales.

While the assumption of independence of the covariate and treatment effect has been met, the assumption of homogeneity of regression slopes was violated. Therefore, no results will be reported.

Table 7.10 Online consultation: Between-Subjects effects

R² = .801 (Adjusted R² = .796)

5.11 Citation: ANCOVA

The dependant variable is citation rate, which is measured as the difference in citation rate per title during the period from April 2009 to December 2009 as found in the Google Scholar search engine. The independent variable is accessibility, which is measured as the set to which a title belongs. The covariates used are Book visits, page views and sales.

While the assumption of homogeneity of regression slopes has been met, the assumption of independence of the covariate and treatment effect was violated. Therefore, no results will be reported.

Table 7.11 Citation: Between-Subjects effects

R² = .060 (Adjusted R² = .035)

5.12 Sales: ANCOVA

The covariate Book visits was not significantly related to the sales figures, F(1,389) = .23, p < .64, partial η²= .00, the covariate page views was not significantly related to the sales figures, F(1,389) = 1.12, p < .29, partial η²= .00, the covariate citations was not significantly related to the sales figures, F(1,389) = .30, p < .86, partial η²= .00. There was also no significant effect of accessibility on sales figures, F(3,389) = .52, p < .68, partial η²= .00.

Table 7.12 Sales: Between-Subjects effects

R² = .009 (Adjusted R² = -.017)

5.13 Multiple regression

As described in the previous chapter, several intrinsic properties of the publications were used to place them into different sets. Those properties – year of publication, print run, language and subject of the publication – may influence the usage by scientists. If – for instance – a researcher is not able to read Dutch, relevant publications in that language are not likely to be used. Therefore, several analyses were carried out to measure whether these effects occur. Furthermore, discovery of a title is crucial to usage; this is the reason for taking it into account as well. In the following paragraphs the results for discovery, online consultation, citation and sales are described.

5.14 Discovery: multiple regression

The dependant variable is discovery, which is measured as the number of monthly Book visits a title received in the Google Book search program during the period from April 2009 to December 2009. A multiple regression analysis is carried out, using a hierarchical model containing the following variables:

1. Accessibility, which is measured as the set to which a title belongs; Year of publication; Print run of the publication; Language of the publication;

2. Subject of the publication.

Table 7.13 Discovery: Multiple regression model summary

Table 7.14 Discovery: Multiple regression results

R² = .13 for Step 1, ΔR² = .11 for Step 2 (p < .05). ^* p < .05, ** p < .001.

Print run, publication subject “Art – History” and publication subject “Music” were significant predictors of Book visits.

5.15 Online consultation: multiple regression

Online consultation can be measured by the number of page views a title received in the Google Book Search Program, and by the usage through the AUP repository. As downloading a document from the repository points to actual usage whereas opening a page in the repository does not always imply usage, only the number of repository downloads are analysed.

5.15.1 Page views: multiple regression

The dependant variable is page views, which is measured as the number of monthly page views a title received in the Google Book search program during the period from April 2009 to December 2009. A multiple regression analysis is carried out, using a hierarchical model containing the following variables:

1. Accessibility, which is measured as the set to which a title belongs; Book visits, which is measured as the total number of book visits a publication received during the during the period from April 2009 to December 2009; Year of publication; Print run of the publication; Language of the publication;

2. Subject of the publication.

Table 7.15 Page views: Multiple regression model summary

Table 7.16 Page views: Multiple regression results

R² = .8 for Step 1, ΔR² = .02 for Step 2 (p < .05). ^* p < .05, ^** p < .001.

Set 3, Set 4 and book visits were significant predictors of page views. Publication language “German” and publication subject “Music” have a negative relationship.

5.15.2 Repository downloads: multiple regression

The dependant variable is downloads, which is measured as the number of monthly downloads a title received in the AUP repository during the period from April 2009 to December 2009. A multiple regression analysis is carried out, using a hierarchical model containing the following variables:

1. Accessibility, which is measured as the set to which a title belongs – Set 2 or Set 3; Book visits, which is measured as the total number of book visits a publication received during the during the period from April 2009 to December 2009; Year of publication and print run of the publication; Language of the publication; Page views, which is measured as the total number of page views during the period April 2009 to December 2009 a title received in the AUP repository;

2. Subject of the publication.

Table 7.17 Downloads: Multiple regression model summary

Table 7.18 Downloads: Multiple regression results

R² = .14 for Step 1, ΔR² = .16 for Step 2 (p < .05). ^* p < .001.

Repository views and publication subject “Culture” were significant predictor of downloads.

5.16 Citations: multiple regression

The dependant variable is citations, which is measured as the difference in citation rate per title during the period from April 2009 to December 2009 as found in the Google Scholar search engine. A multiple regression analysis is carried out, using a hierarchical model containing the following variables:

2. Subject of the publication.

Table 7.19 Citations: Multiple regression model summary

Table 7.20 Citations: Multiple regression results

R² = .04 for Step 1, ΔR² = .09 for Step 2 (p < .05). ^* p < .05.

Book visits, publication language “English” and publication subject “Science” were significant predictors of citations.

5.17 Sales: multiple regression

The dependant variable is sales, which is measured as the total number of sales of a publication during the period from April 2009 to December 2009. A multiple regression analysis is carried out, using a hierarchical model containing the following variables:

2. Subject of the publication.

Table 7.21 Sales: Multiple regression model summery

Table 7.22 Sales: Multiple regression results

R² = .11 for Step 1, ΔR² = .18 for Step 2 (p < .05). ^* p < .05, ^** p < .001.

Print run, publication language “English”, publication subject “Dutch Literature – History” and publication subject “Japan; Culture – History” were significant predictors of sales.