Why All Genealogy Records Are Not Free

The following article is from Eastman's Online Genealogy Newsletter and is copyright by Richard W. Eastman. It is re-published here with the permission of the author. Information about the newsletter is available at http://www.eogn.com.  

- Why Isn't It Free?

The following was published as a Plus Edition article a couple of weeks
ago. At the suggestion of several newsletter readers, I am now
republishing it in the Standard Edition newsletter for all to see. If
you wish, you do have permission to forward this article to others or to
republish it elsewhere for non-profit use. For details, look at the
menus on the upper right side of this web page and click on COPYRIGHTS.

One topic that surprises me has appeared several times recently in
comments from this newsletter's readers. Some people have questioned the
idea of placing public domain data online and charging for access to
that information, as is done by Ancestry.com, Footnote.com, FindMyPast,
WorldVital Records, and others. One person claimed that it is illegal to
charge for access to public domain data, and another reader stated that
the online sites are "violating my constitutional rights to view the

Sorry, folks, but that simply isn't true.

Indeed, in the U.S. and Canada, governmental records are public domain,
available free of charge to those who can travel to the repositories
where the original records are stored. Many private records, such as
church records, may not be public domain, but they are also often
available at no charge if one can travel to view them. When travel is
not an option, a trip to a local library may suffice if that library has
microfilms of the original records that patrons can view for free. (For
this article, I will ignore the costs of sending a filming crew to a
repository to make the microfilms and the expenses of reproducing and
distributing microfilms. However, those expenses are not trivial.)

Given the fact that the records are already available "free of charge,"
one might question the need to pay $50 or $100 or more per year to
access the same records on a subscription service such as Footnote.com,
Ancestry.com, Origins.net, NewEnglandAncestors.org, and other genealogy
web sites.

First of all, the idea that the records are available "free" is only
true for those who live near the repository that houses the original
records or photocopies of the records and can walk to that repository.
If you have to travel some distance to a library that houses the records
you seek, you will incur travel expenses. Even a trip to a library a few
miles away will incur costs for gasoline and perhaps for parking. Such
records are not truly "free."

While perhaps the visitor doesn't pay anything to view records in books
or in microfilms, that library had to pay someone for the books, the
microfilm, the microfilm reader, the building, the employees, heat,
electricity, etc. The library may not charge the patron to look at the
microfilms, but the process certainly is not free. Information in a
library is never really free. Someone always pays, usually the

A longer trip will incur airfare or automobile expenses, along with
hotel rooms and meals. I can go to Salt Lake City to view the “free”
records available at the Family History Library. The last time I made
that trip, it certainly was not “free.”

A three-day trip to a distant repository can easily cost $500 or more.
If I want to go back to the "old country" to look at records, expenses
will be much higher, of course. For many who do not live near major
genealogy libraries, this quickly changes the concept of "free."

From the genealogist's viewpoint, accessing records published on the
Internet greatly increases convenience and reduces travel expenses. From
the publisher's viewpoint, the financial realities of publishing on the
web add up rather quickly when one looks at the expenses involved with
acquiring, digitizing, and electronically publishing records of interest
to genealogists. Such an effort is not cheap.

To be sure, there are hundreds of web pages available today at no charge
that contain transcribed records from a variety of sources. RootsWeb has
many such pages, as do freebmd.org.uk, genuki.org.uk, Find-A-Grave,
hundreds of local society web sites, and many others. These web sites
contain records transcribed by volunteers, and someone pays for the web
servers, often without passing those expenses on to users. In most
cases, the expenses are not huge, and advertising can help pay the
bills. A few of these web sites may even contain images of the original
records. Most of these sites have databases that contain hundreds or
even thousands of records. In contrast, commercial services typically
provide millions of records, usually many millions. With larger
databases come larger expenses.

Let's assume that a company or even a genealogy society, such as the New
England Historic Genealogical Society, decides to make state vital
records available on the World Wide Web. Once an agreement has been
negotiated with the state, the company or society starts work. I will
make some rough estimates of the expenses involved.

In our example, let's say that the project entails 25 million
handwritten records that were recorded over a 50-year period. (This
would be for a state with a rather small population; many states will
have more records than that in a 50-year period.) Digitizing these
records will require thousands of manhours. It is doubtful if anyone can
find that number of unpaid volunteers to travel to the repository, run
the scanners, and enter the data. In fact, the repository may not even
have room for a crew of that size.

If you own a scanner, calculate how many pages you can scan in one hour.
Then calculate how long it would take you to scan twenty-five million
pages. Using a scanner purchased at a local computer store, I can scan
one page every 2 minutes. Assuming a 40-hour work week, I will need
20,833 weeks for this project. Clearly, hobbyist-grade scanners will
never get the job done. Expensive, high-speed scanners need to be
purchased. Five thousand dollars is a typical price for high-volume
scanners, and this project will probably require two or more of them.
Next, operators need to be hired to sit at the scanners 40 hours a week
to create the digitized images. Those operators need to be paid.

This process only makes scanned images of the records, probably the
simplest and least-expensive part of the project. Somebody else then
needs to make indexes as well. The process will vary, depending upon
what is already available. In many cases, someone sitting at a computer
will need to index each and every one of the millions of entries. Add in
many more thousands of dollars in labor charges.

Now we have created images, plus indexes to those images. We need some
skilled programmers to combine all the data into one huge database.
Skilled database administrators' labor also is not cheap.

Once the records have been digitized and a database has been created,
the real expenses begin. This database with twenty-five million
high-quality images requires several terabytes of disk storage. (A
terabyte equals one thousand gigabytes, the same as one million
megabytes.) The purchase of a high-uptime, high-throughput disk array of
that size, along with built-in backup capabilities, easily costs $25,000
or more per terabyte. Add in the expense of a web server, a database,
and the required software, and the cost soon exceeds $100,000 for the
required hardware and software to make these records available online to
genealogists. This figure does not include the labor charges mentioned
earlier. All this is for a small web site. High activity web sites such
as Ancestry.com will cost much, much more.

Next, we need very high-speed connections to connect the hardware to the
Internet so that we can serve 100 or more simultaneous users who wish to
view these large graphics files. A single T-1 line is the minimum
requirement for 20 or 30 simultaneous users, but most commercial web
servers today are connected by multiple OC-3 connections. (I'll skip the
technical discussion of T-1 and OC-3 connections. Let's just say that
they are very high-speed lines, capable of handling many simultaneous
users. They also cost a lot of money.)

In most cases, it is cheaper to install the disk array, database server,
and web server at a commercial web hosting service than to build one's
own data center. Hosting fees for a high-usage database start at $1,000
a month and quickly go up. Way up. Commercial genealogy companies with
lots of users typically pay $10,000 or more per month in hosting fees.
This may seem high, but it is still much less expensive than building
your own data center.

The bottom line is clear to anyone with a calculator: more than a
quarter million dollars is easily expended to make high-quality original
source records available to genealogists. Following that cost are
monthly fees to keep this data available.

The result is a database in which one can search for a name, find it,
double-click on the entry, and then see an image of the original record.
In other words, primary source records are visible to anyone in Virginia
or California or Australia or anywhere else in the world with no travel
expenses required.

Of course, I have ignored many other expenses. When a popular database
of this sort is placed online, users will have questions. Someone needs
to answer those questions; so, we must create a customer service
department. In the case of a society, a few members might step forward
to answer questions. In the case of Ancestry.com, it means several
hundred employees and a large building with telephones, computers, and
high-speed data connections. Again, you can guess at the expenses.

Where did this money come from?

Yes, it would be nice to provide genealogy information online at no
cost. However, if you are the person who wishes to provide that
information, a few minutes with a calculator will quickly bring you back
to reality.

I like to use the analogy of water. Water is free. If I wish, I can
obtain all the water I want at no charge. All I have to do is go to
where the water is located. I can leave buckets on the lawn when it
rains to obtain free water. If that is insufficient to meet my needs, I
can walk to the nearest river or lake with buckets, scoop up all the
water I want, and carry it home at no charge. Our ancestors did that
centuries ago, and we can still do that today if we want. Nothing has
changed. Water is still free.

However, if we want the convenience of having water delivered to our
homes, we will incur expenses. Our ancestors did not have this option.

Someone paid to purchase large pumps, and they paid for the pipes to be
buried underground to connect our house to the water mains. The entire
construction effort cost many thousands of dollars. In addition,
employees were hired to maintain the pumps and the pipes to make sure
everything continues to work correctly. As a result, those who consume
the water must pay a fee. Yes, the water is free; but, the pipes, the
pumps, and the employees are not. Most all urban home owners today pay a
water bill. We pay for the convenience of home delivery. Those who do
not want to pay the delivery fee could elect to have the water shut off
and then obtain free water in the same manner that our ancestors did.

In my mind, public domain information is the same. The information is
free, always has been free, and probably always will be free. I can
still obtain information today at no charge in the same manner I always
have: by going to the source records and looking at them in person. If I
want to go to the location where the information is located, I can do so
at no charge, assuming I am willing to walk. If the information is
located hundreds or thousands of miles away, I may encounter significant
travel expenses, but the information itself remains free of charge.

HOWEVER, if I want someone to conveniently deliver the information to my
home at any hour of the day or night that I might want it, I have to pay
for "the pipes" and for the labor of those who provide that convenient
access. We might consider the information to still be free, but the
"pipes" (the servers, the high-speed data connections, the data centers,
and the air conditioning to keep the equipment cooled, etc.) are not
free, nor is all the labor of the hundreds of people who are involved in
delivering that information to me. Those who invest millions of dollars
in high-speed data "pipes" and all the associated labor certainly do
deserve fair compensation for their investments.

Yes, the data was free once, and it is still free today. As always, I
still may go to the location where the information is stored and, in
most cases, I can look at that information free of charge. Nothing has
changed. The only significant change is that we all now have another
option: we can still do things the old way at no charge, or we may use
new, convenient delivery options if we are willing to pay for that

Personally, I cannot afford to travel to Maine or Texas or England or
Sweden to look at every single bit of information about my ancestors
that I want to see. I find it much cheaper to sit at home and pay $10 or
$30 a month to look at that information. Heck, ten bucks won't even pay
for the shuttle bus to the airport, much less airline tickets, hotels,
restaurant meals, and other required expenses to look at the "free"

The only practical method of placing large amounts of genealogy
information on the web is to have someone pay the expenses of acquiring,
digitizing, and providing the data. In most cases, this means that the
customers who benefit will pay. If the genealogy public does not wish to
pay the expenses of "piping" the information to our homes, we can always
do what all the genealogists of yesteryear used to do: travel to the
repositories where the documents are kept.

As for me, I will choose the cheaper option and pay a modest fee for
someone to "pipe" the information directly to my home.

Do you have comments, questions, or corrections to this article? If so,
please post your words at:

You may find that other newsletter readers have already posted comments,
questions, or corrections to this article at the same place.