APIDocEvolution

An Empirical Study on Evolution of API Documentation

PROJECT SUMMARY

With the evolution of an API library, its documentation also evolves across versions. The evolution of API documentation is common knowledge for programmers and library developers, but not in a quantitative form.Without such quantitative knowledge, programmers may neglect important revisions of API documentation, and library developers may not effectively improve API documentation based on its revision histories. There is a strong need to conduct a quantitative study on API documentation evolution. However, as API documentation

is large in size and revisions can be complicated, it is quite challenging to conduct such a study. In this paper, we present an analysis methodology to analyze the evolution of API documentation. Based on the methodology, we conduct a quantitative study on API documentation evolution of five widely used libraries. The results reveal various valuable findings, and these findings allow programmers and library developers to better understand API documentation evolution.

The paper submitted to FASE 2011.

PEOPLE

Lin Shi (iTechs, ISCAS), Hao Zhong (iTechs, ISCAS), Tao Xie (NCSU), and MingShu Li (iTechs, ISCAS)

SUBJECTS

The subjects of the empirical study are as shown in Table 1.

RQ1: Which parts of API documentation are frequently revised?

We compared 2131 revisions that cover the java.util package of J2SE, and all the other four libraries. The overall resutls are as shown in Figure 1

Figure 1. Frequently changed parts in API documentation.

(1) Annotation

The official guidance of Java documentation defines tags and annotations. In this paper, we refer to both tags and annotations as annotations for short. An annotation consists a character ��@�� and a predefined keyword. Revisions of annotations can cause revisions in corresponding API documents. Examples of found revised annotations across versions of API libraries are as follows:

(1.1) Version

The version of an API element are marked by the @version annotation or the @since annotation. Once marked, the version number in API documents will be updated automatically:

ActiveMQ 5.0.0: org.apache.activemq.util.IOHelper

Version:

$Revision: 600891 $

ActiveMQ 5.1.0: org.apache.activemq.util.IOHelper

Version:

$Revision: 632961 $

(1.2) Exception

The thrown exceptions of an API method are marked by the @exception annotation or the @thrown annotation. When library developers can change thrown exceptions of an API method, they typically revise these annotations, and the revisions update API documents:

J2SE 1.2.2: java.util.SortedMap. subMap(Object, Object)

java.lang.IllegalArgumentException - if fromKey is greater than toKey.

J2SE 1.3.1: java.util.SortedMap. subMap(Object, Object)

java.lang.IllegalArgumentException - if fromKey is greater than toKey; or if this map is itself a subMap, headMap, or tailMap, and fromKey or toKey are not within the specified range of the subMap, headMap, or tailMap

(1.3) Reference

The reference relations can be marked by the @see annotation or the @link annotation. Library developers can change these annotations to update URLs of API documents.

J2SE 1.4.2: java.util.Locale. getISO3Language()

The ISO 639-2 language codes can be found on-line at ftp://dkuug.dk/i18n/iso-639-2.txt

J2SE 1.5: java.util.Locale. getISO3Language()

The ISO 639-2 language codes can be found on-line at http://www.loc.gov/standards/iso639-2/englangn.html

(1.4) Parameter

The parameters and and their values of an API method are marked by the @param annotation. When library developers can revise these annotations to update documents of those parameters.

J2SE 1.2.2: java.util.Collections. max(Collection, Comparator)

Parameters

coll - the collection whose maximum element is to be determined.

J2SE 1.3.1: java.util.Collections. max(Collection, Comparator)

Parameters

coll - the collection whose maximum element is to be determined.

comp - the comparator with which to determine the maximum element. A null value indicates that the elements' natural ordering should be used.

(1.5) Return value

The return value of an API method is marked by the @return annotation. Library developers can revise these annotations to update documents of return values.

J2SE 1.3.1: java.util. ArrayList.contains(Object)

Returns

true if this collection contains the specified element.

J2SE 1.4.2: java.util. ArrayList.contains(Object)

Returns

true if the specified element is present; false otherwise.

(1.6) Inheritance

Similar documents caused by inheritance relations can be generated by the @inheritDoc annotation automatically. Also, these annotations can update API documents of a sub-class when the document of its super-class is revised. For example, the following document of J2SE 1.6 is automatically updated by the @inheritDoc annotation.

J2SE 1.5: java.util.AbstractCollection.add(E)

Ensures that this collection contains the specified element (optional operation). Returns true if the collection changed as a result of the call. (Returns false if this collection does not permit duplicates and already contains the specified element.) Collections that support this operation may place limitations on what elements may be added to the collection. In particular, some collections will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. Collection classes should clearly specify in their documentation any restrictions on what elements may be added.

This implementation always throws an UnsupportedOperationException.

J2SE 1.6: java.util.AbstractCollection.add(E)

Ensures that this collection contains the specified element (optional operation). Returns true if this collection changed as a result of the call. (Returns false if this collection does not permit duplicates and already contains the specified element.)

Collections that support this operation may place limitations on what elements may be added to this collection. In particular, some collections will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. Collection classes should clearly specify in their documentation any restrictions on what elements may be added.

If a collection refuses to add a particular element for any reason other than that it already contains the element, it must throw an exception (rather than returning false). This preserves the invariant that a collection always contains the specified element after this call returns.

This implementation always throws an UnsupportedOperationException.

(1.7) Deprecation

When an API element becomes deprecated, it is typically marked by the @ deprecated annotation. The annotation update the document of the API element with an ��deprecated�� notice.

Lucene 2.9.2: org.apache.lucene.search.function.CustomScoreQuery.customExplain(int, Explanation, Explanation)

Deprecated. Will be removed in Lucene 3.1. The doc is relative to the current reader, which is unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). Please override getCustomScoreProvider(org.apache.lucene.index.IndexReader) and return a subclass of CustomScoreProvider for the given IndexReader.

Explain the custom score.

Lucene 3.0.0: org.apache.lucene.search.function.CustomScoreQuery.customExplain(int, Explanation, Explanation)

Explain the custom score. Whenever overriding customScore(int, float, float), this method should also be overridden to provide the correct explanation for the part of the custom scoring.

(2) Literal polish

Literal polish includes those revisions that do not change the basic meanings of API documents as literal enhancement.

(2.1) Rephrasing

This category includes those revisions that dot not change meanings, but to improve accuracies. One such example is as follows.

J2SE 1.2.2: java.util.GregorianCalendar

Week 1 for a year is the first week that contains at least getMinimalDaysInFirstWeek() days from that year.

J2SE 1.3.1: java.util.GregorianCalendar

Week 1 for a year is the earliest seven day period starting on getFirstDayOfWeek() that contains at least getMinimalDaysInFirstWeek() days from that year.

(2.2) Syntax

This category includes those revisions that correct grammar errors One such example is as follows.

J2SE 1.5: java.util. ArrayList. remove(int)

Parameters

index - the index of the element to removed

J2SE 1.6: java.util. ArrayList. remove(int)

Parameters

index - the index of the element to be removed

(2.3) Typo

This category includes those revisions that correct typos One such example is as follows.

Struts 2.1.6: org.apache.struts2.interceptor.CookieInterceptor. setCookiesName(String)

Set the cookiesName which if matche will allow the cookie to be injected into action, could be comma-separated string.

Struts 2.1.8: org.apache.struts2.interceptor.CookieInterceptor. setCookiesName(String)

Set the cookiesName which if matched will allow the cookie to be injected into action, could be comma-separated string.

(2.4) Format

This category includes those revisions that imporve formats of some texts. One such example is as follows.

J2SE 1.5: java.util. ArrayList. remove(int)

Note that the set method reverses the order of the parameters, to more closely match array usage.

J2SE 1.6: java.util. ArrayList. remove(int)

Note that the set method reverses the order of the parameters, to more closely match array usage.

(3) Programming tips

The revisions of programming tips include those revisions that contain advices for programmers to learn how to use APIs.

(3.1) Notice

This category includes those revisions that revise notices within API documents.

Lucene 2.9.2: org.apache.lucene.util.CloseableThreadLocal

Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to dereference the things you had stored in it, even once the ThreadLocal instance itself is no longer referenced. This is because there is single, master map stored for each thread, which all ThreadLocals share, and that master map only periodically purges "stale" entries. While not technically a memory leak, because eventually the memory will be reclaimed, it can take a long time and you can easily hit OutOfMemoryError because from the GC's standpoint the stale entries are not reclaimable. This class works around that, by only enrolling WeakReference values into the ThreadLocal, and separately holding a hard reference to each stored value. When you call close(), these hard references are cleared and then GC is freely able to reclaim space by objects stored in it.

Lucene 3.0.0: org.apache.lucene.util.CloseableThreadLocal

Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to dereference the things you had stored in it, even once the ThreadLocal instance itself is no longer referenced. This is because there is single, master map stored for each thread, which all ThreadLocals share, and that master map only periodically purges "stale" entries. While not technically a memory leak, because eventually the memory will be reclaimed, it can take a long time and you can easily hit OutOfMemoryError because from the GC's standpoint the stale entries are not reclaimable. This class works around that, by only enrolling WeakReference values into the ThreadLocal, and separately holding a hard reference to each stored value. When you call close(), these hard references are cleared and then GC is freely able to reclaim space by objects stored in it. We can not rely on ThreadLocal.remove() as it only removes the value for the caller thread, whereas close() takes care of all threads. You should not call close() until all threads are done using the instance.

(3.2) Code Example

This category includes those revisions that revise code examples within API documents. One example is as follows.

J2SE 1.5: java.util.List. hashCode()

hashCode = 1;

Iterator i = list.iterator();

while (i.hasNext()) {

Object obj = i.next();

hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());

}

J2SE 1.6: java.util.List. hashCode()

int hashCode = 1;

Iterator<E> i = list.iterator();

while (i.hasNext()) {

E obj = i.next();

hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());

}

(3.3) New Descriptions

This category includes those revisions that add new descritptions to give details of an API element. One example is as follows.

J2SE 1.5: java.util.ListResourceBundle. getContents()

See class description.

J2SE 1.6: java.util.ListResourceBundle. getContents()

Returns an array in which each item is a pair of objects in an Object array. The first element of each pair is the key, which must be a String, and the second element is the value associated with that key. See the class description for details.

Return��

an array of an Object array representing a key-value pair.

The detailed data are shown in Table 1.

Table 1. Detailed frequently changed parts in API documentation.

RQ2: To what degree do such revisions indicate behavioral differences of APIs?

We compared 2131 revisions that cover the java.util package of J2SE, and all the other four libraries. The overall resutls are as shown in Figure 2

Figure 2. Frequently changed parts of behavioral differences.

(1) Exceptions

We find that revisions of exceptions can reflect behavioral differences.

(1.2) Addition

An added exception reflects that an API method deal with more exceptions.

J2SE 1.2.2: java.util. ResourceBundle.getObject(String)

Throws

MissingResourceException

J2SE 1.3.1: java.util. ResourceBundle.getObject(String)

Throws

java.lang.NullPointerException - if key is null.

MissingResourceException

(1.2) Modification

A modified exception reflects that an API method can throw different exceptions.

J2SE 1.3.1: java.util.Vector. addAll(Collection)

Throws

java.lang.ArrayIndexOutOfBoundsException - index out of range (index < 0 || index > size()).

J2SE 1.4.2: java.util. ResourceBundle.getObject(String)

Throws

java.lang.NullPointerException - if the specified collection is null.

(1.3) Deletion

A deleted exception reflects that an API method will not throw such exceptions any more.

ActiveMQ 5.1.0: org.apache.activemq.transport.tcp.SslTransportFactory. setKeyAndTrustManagers(KeyManager[], TrustManager[], SecureRandom)

Sets the key and trust managers used in constructed socket factories. Passes given arguments to SSLContext.init(...).

Parameters

km - The sources of authentication keys or null.

tm - The sources of peer authentication trust decisions or null.

random - The source of randomness for this generator or null.

Throws��

java.security.KeyManagementException

ActiveMQ 5.2.0: org.apache.activemq.transport.tcp.SslTransportFactory. setKeyAndTrustManagers(KeyManager[], TrustManager[], SecureRandom)

Deprecated. "Do not use anymore... using static initializers like this method only allows the JVM to use 1 SSL configuration per broker."

Parameters

km -

tm -

object -

(2) API usage

We find that revisions of API usages can reflect behavioral differences.

Log4j 1.2.15: org.apache.log4j.WriterAppender. immediateFlush

Immediate flush means that the underlying writer or output stream will be flushed at the end of each append operation.

J2SE 1.4.2: org.apache.log4j.WriterAppender. immediateFlush

Immediate flush means that the underlying writer or output stream will be flushed at the end of each append operation unless shouldFlush() is overridden.

(2.1) Notices

Revisions of notices reflect that corresponding API elements may have behavioral differences.

(2.2) Code examples

Revisions of code example reflect that API elements may have behavioral differences since library developers often use these code examples to illustrate API usages. One example is as follows:

lucene 2.9.0, org.apache.lucene.search.Hits:

TopScoreDocCollector collector = new TopScoreDocCollector(hitsPerPage);

searcher.search(query, collector);

ScoreDoc[] hits = collector.topDocs().scoreDocs;

for (int i = 0; i < hits.length; i++) {

int docId = hits[i].doc;

Document d = searcher.doc(docId);

// do something with current hit

...

lucene 2.9.1, org.apache.lucene.search.Hits:

TopDocs topDocs = searcher.search(query, numHits);

ScoreDoc[] hits = topDocs.scoreDocs;

for (int i = 0; i < hits.length; i++) {

int docId = hits[i].doc;

Document d = searcher.doc(docId);

// do something with current hit

...

(2.3) Alternatives

API documents may introduce alternative APIs of an API element, and revisions of alternatives can also reflect behavioral difference.

Lucene 2.9.0: org.apache.lucene.analysis.StopAnalyzer.StopAnalyzer(Set)

Deprecated. Use StopAnalyzer(Set, boolean) instead

Lucene 2.9.1: org.apache.lucene.analysis.StopAnalyzer.StopAnalyzer(Set)

Deprecated. Use StopAnalyzer(Version, Set) instead

(2.4) Call sequences

API documents may introduce call sequences of API methods, and revisions of these documents can also reflect behavioral difference.

Lucene 2.9.2: org.apache.lucene.search.Scorer. score()

Returns the score of the current document matching the query. Initially invalid, until DocIdSetIterator.next() or DocIdSetIterator.skipTo(int) is called the first time, or when called from within Collector.collect(int).

Lucene 3.0.0: org.apache.lucene.search.Scorer. score()

Returns the score of the current document matching the query. Initially invalid, until DocIdSetIterator.nextDoc() or DocIdSetIterator.advance(int) is called the first time, or when called from within Collector.collect(int).

(3) Functionalities

We find that revisions of API functionalities can reflect behavioral differences.

(3.1) New functionalities

Revisions of API documents may introduce new functionalities that reflect behavioral differences.

J2SE 1.4.2: java.util.AbstractList

public abstract class AbstractList extends AbstractCollection implements List

J2SE 1.5: java.util.AbstractList <E>

public abstract class AbstractList<E> extends AbstractCollection<E> implements List<E>

(3.2) Input values

Revisions of API documents on input values can reflect behavioral differences of preparing input values. .

J2SE 1.3.1: java.util.Properties. store(OutputStream, String)

The ASCII characters \, tab, newline, and carriage return are written as \\, \t, \n, and \r, respectively.

J2SE 1.4.2: java.util.Properties. store(OutputStream, String)

The ASCII characters \, tab, form feed, newline, and carriage return are written as \\, \t, \f \n, and \r, respectively.

(3.3) Default values

Revisions of API documents on default values can reflect behavioral differences of preparing default values.

J2SE 1.3.1: java.util. Hashtable. Hashtable(Map)

The hashtable is created with a capacity of twice the number of entries in the given Map or 11 (whichever is greater), and a default load factor, which is 0.75.

J2SE 1.4.2: java.util. Hashtable. Hashtable(Map)

The hashtable is created with an initial capacity sufficient to hold the mappings in the given Map and a default load factor, which is 0.75.

(3.4) Output values

Revisions of API documents on output values can reflect behavioral differences of preparing output values.

Log4j 1.2.15: org.apache.log4j.Appender. getName()

Get the name of this appender. The name uniquely identifies the appender.

Log4j 1.2.16: org.apache.log4j.Appender. getName()

Get the name of this appender.

Returns

name, may be null.

The detailed data are shown in Table 2 and here. Table 2. Detailed frequently changed parts in API documentation

RQ3: How frequently are API elements and their documentation changed?

Table 3 of the submitted draft lists percentages of API differences by types. Table 3 as below lists numbers of API differences by types.

Table 3 Number of API differences by type