APIDocEvolution
An Empirical Study on Evolution of API Documentation
PROJECT SUMMARY
With the evolution of an API library, its documentation also evolves across versions. The evolution of API documentation is common knowledge for programmers and library developers, but not in a quantitative form.Without such quantitative knowledge, programmers may neglect important revisions of API documentation, and library developers may not effectively improve API documentation based on its revision histories. There is a strong need to conduct a quantitative study on API documentation evolution. However, as API documentation
is large in size and revisions can be complicated, it is quite challenging to conduct such a study. In this paper, we present an analysis methodology to analyze the evolution of API documentation. Based on the methodology, we conduct a quantitative study on API documentation evolution of five widely used libraries. The results reveal various valuable findings, and these findings allow programmers and library developers to better understand API documentation evolution.
The paper submitted to FASE 2011.
PEOPLE
Lin Shi (iTechs, ISCAS), Hao Zhong (iTechs, ISCAS), Tao Xie (NCSU), and MingShu Li (iTechs, ISCAS)
SUBJECTS
The subjects of the empirical study are as shown in Table 1.
RQ1: Which parts of API documentation are frequently revised?
We compared 2131 revisions that cover the java.util package of J2SE, and all the other four libraries. The overall resutls are as shown in Figure 1
Figure 1. Frequently changed parts in API documentation.
(1) Annotation
The official guidance of Java documentation defines tags and annotations. In this paper, we refer to both tags and annotations as annotations for short. An annotation consists a character ��@�� and a predefined keyword. Revisions of annotations can cause revisions in corresponding API documents. Examples of found revised annotations across versions of API libraries are as follows:
(1.1) Version
The version of an API element are marked by the @version annotation or the @since annotation. Once marked, the version number in API documents will be updated automatically:
ActiveMQ 5.0.0: org.apache.activemq.util.IOHelper
Version:
$Revision: 600891 $
ActiveMQ 5.1.0: org.apache.activemq.util.IOHelper
Version:
$Revision: 632961 $
(1.2) Exception
The thrown exceptions of an API method are marked by the @exception annotation or the @thrown annotation. When library developers can change thrown exceptions of an API method, they typically revise these annotations, and the revisions update API documents:
J2SE 1.2.2: java.util.SortedMap. subMap(Object, Object)
java.lang.IllegalArgumentException - if fromKey is greater than toKey.
J2SE 1.3.1: java.util.SortedMap. subMap(Object, Object)
java.lang.IllegalArgumentException - if fromKey is greater than toKey; or if this map is itself a subMap, headMap, or tailMap, and fromKey or toKey are not within the specified range of the subMap, headMap, or tailMap
(1.3) Reference
The reference relations can be marked by the @see annotation or the @link annotation. Library developers can change these annotations to update URLs of API documents.
J2SE 1.4.2: java.util.Locale. getISO3Language()
The ISO 639-2 language codes can be found on-line at ftp://dkuug.dk/i18n/iso-639-2.txt
J2SE 1.5: java.util.Locale. getISO3Language()
The ISO 639-2 language codes can be found on-line at http://www.loc.gov/standards/iso639-2/englangn.html
(1.4) Parameter
The parameters and and their values of an API method are marked by the @param annotation. When library developers can revise these annotations to update documents of those parameters.
J2SE 1.2.2: java.util.Collections. max(Collection, Comparator)
Parameters
coll - the collection whose maximum element is to be determined.
J2SE 1.3.1: java.util.Collections. max(Collection, Comparator)
Parameters
coll - the collection whose maximum element is to be determined.
comp - the comparator with which to determine the maximum element. A null value indicates that the elements' natural ordering should be used.
(1.5) Return value
The return value of an API method is marked by the @return annotation. Library developers can revise these annotations to update documents of return values.
J2SE 1.3.1: java.util. ArrayList.contains(Object)
Returns
true if this collection contains the specified element.
J2SE 1.4.2: java.util. ArrayList.contains(Object)
Returns
true if the specified element is present; false otherwise.
(1.6) Inheritance
Similar documents caused by inheritance relations can be generated by the @inheritDoc annotation automatically. Also, these annotations can update API documents of a sub-class when the document of its super-class is revised. For example, the following document of J2SE 1.6 is automatically updated by the @inheritDoc annotation.
J2SE 1.5: java.util.AbstractCollection.add(E)
Ensures that this collection contains the specified element (optional operation). Returns true if the collection changed as a result of the call. (Returns false if this collection does not permit duplicates and already contains the specified element.) Collections that support this operation may place limitations on what elements may be added to the collection. In particular, some collections will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. Collection classes should clearly specify in their documentation any restrictions on what elements may be added.
This implementation always throws an UnsupportedOperationException.
J2SE 1.6: java.util.AbstractCollection.add(E)
Ensures that this collection contains the specified element (optional operation). Returns true if this collection changed as a result of the call. (Returns false if this collection does not permit duplicates and already contains the specified element.)
Collections that support this operation may place limitations on what elements may be added to this collection. In particular, some collections will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. Collection classes should clearly specify in their documentation any restrictions on what elements may be added.
If a collection refuses to add a particular element for any reason other than that it already contains the element, it must throw an exception (rather than returning false). This preserves the invariant that a collection always contains the specified element after this call returns.
This implementation always throws an UnsupportedOperationException.
(1.7) Deprecation
When an API element becomes deprecated, it is typically marked by the @ deprecated annotation. The annotation update the document of the API element with an ��deprecated�� notice.
Lucene 2.9.2: org.apache.lucene.search.function.CustomScoreQuery.customExplain(int, Explanation, Explanation)
Deprecated. Will be removed in Lucene 3.1. The doc is relative to the current reader, which is unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). Please override getCustomScoreProvider(org.apache.lucene.index.IndexReader) and return a subclass of CustomScoreProvider for the given IndexReader.
Explain the custom score.
Lucene 3.0.0: org.apache.lucene.search.function.CustomScoreQuery.customExplain(int, Explanation, Explanation)
Explain the custom score. Whenever overriding customScore(int, float, float), this method should also be overridden to provide the correct explanation for the part of the custom scoring.
(2) Literal polish
Literal polish includes those revisions that do not change the basic meanings of API documents as literal enhancement.
(2.1) Rephrasing
This category includes those revisions that dot not change meanings, but to improve accuracies. One such example is as follows.
J2SE 1.2.2: java.util.GregorianCalendar
Week 1 for a year is the first week that contains at least getMinimalDaysInFirstWeek() days from that year.
J2SE 1.3.1: java.util.GregorianCalendar
Week 1 for a year is the earliest seven day period starting on getFirstDayOfWeek() that contains at least getMinimalDaysInFirstWeek() days from that year.
(2.2) Syntax
This category includes those revisions that correct grammar errors One such example is as follows.
J2SE 1.5: java.util. ArrayList. remove(int)
Parameters
index - the index of the element to removed
J2SE 1.6: java.util. ArrayList. remove(int)
Parameters
index - the index of the element to be removed
(2.3) Typo
This category includes those revisions that correct typos One such example is as follows.
Struts 2.1.6: org.apache.struts2.interceptor.CookieInterceptor. setCookiesName(String)
Set the cookiesName which if matche will allow the cookie to be injected into action, could be comma-separated string.
Struts 2.1.8: org.apache.struts2.interceptor.CookieInterceptor. setCookiesName(String)
Set the cookiesName which if matched will allow the cookie to be injected into action, could be comma-separated string.
(2.4) Format
This category includes those revisions that imporve formats of some texts. One such example is as follows.
J2SE 1.5: java.util. ArrayList. remove(int)
Note that the set method reverses the order of the parameters, to more closely match array usage.
J2SE 1.6: java.util. ArrayList. remove(int)
Note that the set method reverses the order of the parameters, to more closely match array usage.
(3) Programming tips
The revisions of programming tips include those revisions that contain advices for programmers to learn how to use APIs.
(3.1) Notice
This category includes those revisions that revise notices within API documents.
Lucene 2.9.2: org.apache.lucene.util.CloseableThreadLocal
Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to dereference the things you had stored in it, even once the ThreadLocal instance itself is no longer referenced. This is because there is single, master map stored for each thread, which all ThreadLocals share, and that master map only periodically purges "stale" entries. While not technically a memory leak, because eventually the memory will be reclaimed, it can take a long time and you can easily hit OutOfMemoryError because from the GC's standpoint the stale entries are not reclaimable. This class works around that, by only enrolling WeakReference values into the ThreadLocal, and separately holding a hard reference to each stored value. When you call close(), these hard references are cleared and then GC is freely able to reclaim space by objects stored in it.
Lucene 3.0.0: org.apache.lucene.util.CloseableThreadLocal
Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to dereference the things you had stored in it, even once the ThreadLocal instance itself is no longer referenced. This is because there is single, master map stored for each thread, which all ThreadLocals share, and that master map only periodically purges "stale" entries. While not technically a memory leak, because eventually the memory will be reclaimed, it can take a long time and you can easily hit OutOfMemoryError because from the GC's standpoint the stale entries are not reclaimable. This class works around that, by only enrolling WeakReference values into the ThreadLocal, and separately holding a hard reference to each stored value. When you call close(), these hard references are cleared and then GC is freely able to reclaim space by objects stored in it. We can not rely on ThreadLocal.remove() as it only removes the value for the caller thread, whereas close() takes care of all threads. You should not call close() until all threads are done using the instance.
(3.2) Code Example
This category includes those revisions that revise code examples within API documents. One example is as follows.
J2SE 1.5: java.util.List. hashCode()
hashCode = 1;
Iterator i = list.iterator();
while (i.hasNext()) {
Object obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
J2SE 1.6: java.util.List. hashCode()
int hashCode = 1;
Iterator<E> i = list.iterator();
while (i.hasNext()) {
E obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
(3.3) New Descriptions
This category includes those revisions that add new descritptions to give details of an API element. One example is as follows.
J2SE 1.5: java.util.ListResourceBundle. getContents()
See class description.
J2SE 1.6: java.util.ListResourceBundle. getContents()
Returns an array in which each item is a pair of objects in an Object array. The first element of each pair is the key, which must be a String, and the second element is the value associated with that key. See the class description for details.
Return��
an array of an Object array representing a key-value pair.
The detailed data are shown in Table 1.
Table 1. Detailed frequently changed parts in API documentation.
RQ2: To what degree do such revisions indicate behavioral differences of APIs?
We compared 2131 revisions that cover the java.util package of J2SE, and all the other four libraries. The overall resutls are as shown in Figure 2
Figure 2. Frequently changed parts of behavioral differences.
(1) Exceptions
We find that revisions of exceptions can reflect behavioral differences.
(1.2) Addition
An added exception reflects that an API method deal with more exceptions.
J2SE 1.2.2: java.util. ResourceBundle.getObject(String)
Throws
MissingResourceException
J2SE 1.3.1: java.util. ResourceBundle.getObject(String)
Throws
java.lang.NullPointerException - if key is null.
MissingResourceException
(1.2) Modification
A modified exception reflects that an API method can throw different exceptions.
J2SE 1.3.1: java.util.Vector. addAll(Collection)
Throws
java.lang.ArrayIndexOutOfBoundsException - index out of range (index < 0 || index > size()).
J2SE 1.4.2: java.util. ResourceBundle.getObject(String)
Throws
java.lang.NullPointerException - if the specified collection is null.
(1.3) Deletion
A deleted exception reflects that an API method will not throw such exceptions any more.
ActiveMQ 5.1.0: org.apache.activemq.transport.tcp.SslTransportFactory. setKeyAndTrustManagers(KeyManager[], TrustManager[], SecureRandom)
Sets the key and trust managers used in constructed socket factories. Passes given arguments to SSLContext.init(...).
Parameters
km - The sources of authentication keys or null.
tm - The sources of peer authentication trust decisions or null.
random - The source of randomness for this generator or null.
Throws��
java.security.KeyManagementException
ActiveMQ 5.2.0: org.apache.activemq.transport.tcp.SslTransportFactory. setKeyAndTrustManagers(KeyManager[], TrustManager[], SecureRandom)
Deprecated. "Do not use anymore... using static initializers like this method only allows the JVM to use 1 SSL configuration per broker."
Parameters
km -
tm -
object -
(2) API usage
We find that revisions of API usages can reflect behavioral differences.
Log4j 1.2.15: org.apache.log4j.WriterAppender. immediateFlush
Immediate flush means that the underlying writer or output stream will be flushed at the end of each append operation.
J2SE 1.4.2: org.apache.log4j.WriterAppender. immediateFlush
Immediate flush means that the underlying writer or output stream will be flushed at the end of each append operation unless shouldFlush() is overridden.
(2.1) Notices
Revisions of notices reflect that corresponding API elements may have behavioral differences.
(2.2) Code examples
Revisions of code example reflect that API elements may have behavioral differences since library developers often use these code examples to illustrate API usages. One example is as follows:
lucene 2.9.0, org.apache.lucene.search.Hits:
TopScoreDocCollector collector = new TopScoreDocCollector(hitsPerPage);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
// do something with current hit
...
lucene 2.9.1, org.apache.lucene.search.Hits:
TopDocs topDocs = searcher.search(query, numHits);
ScoreDoc[] hits = topDocs.scoreDocs;
for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
// do something with current hit
...
(2.3) Alternatives
API documents may introduce alternative APIs of an API element, and revisions of alternatives can also reflect behavioral difference.
Lucene 2.9.0: org.apache.lucene.analysis.StopAnalyzer.StopAnalyzer(Set)
Deprecated. Use StopAnalyzer(Set, boolean) instead
Lucene 2.9.1: org.apache.lucene.analysis.StopAnalyzer.StopAnalyzer(Set)
Deprecated. Use StopAnalyzer(Version, Set) instead
(2.4) Call sequences
API documents may introduce call sequences of API methods, and revisions of these documents can also reflect behavioral difference.
Lucene 2.9.2: org.apache.lucene.search.Scorer. score()
Returns the score of the current document matching the query. Initially invalid, until DocIdSetIterator.next() or DocIdSetIterator.skipTo(int) is called the first time, or when called from within Collector.collect(int).
Lucene 3.0.0: org.apache.lucene.search.Scorer. score()
Returns the score of the current document matching the query. Initially invalid, until DocIdSetIterator.nextDoc() or DocIdSetIterator.advance(int) is called the first time, or when called from within Collector.collect(int).
(3) Functionalities
We find that revisions of API functionalities can reflect behavioral differences.
(3.1) New functionalities
Revisions of API documents may introduce new functionalities that reflect behavioral differences.
J2SE 1.4.2: java.util.AbstractList
public abstract class AbstractList extends AbstractCollection implements List
J2SE 1.5: java.util.AbstractList <E>
public abstract class AbstractList<E> extends AbstractCollection<E> implements List<E>
(3.2) Input values
Revisions of API documents on input values can reflect behavioral differences of preparing input values. .
J2SE 1.3.1: java.util.Properties. store(OutputStream, String)
The ASCII characters \, tab, newline, and carriage return are written as \\, \t, \n, and \r, respectively.
J2SE 1.4.2: java.util.Properties. store(OutputStream, String)
The ASCII characters \, tab, form feed, newline, and carriage return are written as \\, \t, \f \n, and \r, respectively.
(3.3) Default values
Revisions of API documents on default values can reflect behavioral differences of preparing default values.
J2SE 1.3.1: java.util. Hashtable. Hashtable(Map)
The hashtable is created with a capacity of twice the number of entries in the given Map or 11 (whichever is greater), and a default load factor, which is 0.75.
J2SE 1.4.2: java.util. Hashtable. Hashtable(Map)
The hashtable is created with an initial capacity sufficient to hold the mappings in the given Map and a default load factor, which is 0.75.
(3.4) Output values
Revisions of API documents on output values can reflect behavioral differences of preparing output values.
Log4j 1.2.15: org.apache.log4j.Appender. getName()
Get the name of this appender. The name uniquely identifies the appender.
Log4j 1.2.16: org.apache.log4j.Appender. getName()
Get the name of this appender.
Returns
name, may be null.
The detailed data are shown in Table 2 and here. Table 2. Detailed frequently changed parts in API documentation
RQ3: How frequently are API elements and their documentation changed?
Table 3 of the submitted draft lists percentages of API differences by types. Table 3 as below lists numbers of API differences by types.
Table 3 Number of API differences by type