1,446 American Civil War battles and incidents in 1 minute

This is the video that plots 1,446 Civil War battles and incidents in 1 minute.

The program to display the animation is here . The code.exe program inputs the data.txt list of lat/longs and dates and plots them on a map.

Delphi 6 source code is included.

master.xls is a spreadsheet database of Civil War battle data, from which the cwdata.xls spreadsheet was extracted, which was used to generate the data.txt file input to the code.exe animation program.

While consolidating and reconciling data from various sources in the preparation of master.xls, the following became evident:

1. There's a huge problem with data accuracy, consistency, completeness, etc. Every name and number needs to be taken with a grain of salt. Sometimes the numbers do more harm than good (for example, all the dates in one database were too early by one day). Inconsistency is rampant. For example, this page lists Manassas II as starting August 28 but this page at the same web site has it starting August 29.

2. A large number of battles and places are known by more than one name. Lists tend to use one of two approaches to solve this. One method is to key on the "preferred" name and list the alternate name alongside:

Common name Other name Date

----------- -------------- -------------

Hoke's Run Falling Waters July 02, 1861

Bull Run I Manassas I July 21, 1861

Bull Run II Manassas II July 29, 1862

One problem with the above is that often there is more than just two variations on the name:

Common name Other names

----------- -----------------------------------------------------------------

Hoke's Run Falling Waters, Hainesville

Bull Run I Manassas I, First Battle of Bull Run, First Battle of Manassas

Bull Run II Manassas II, Second Battle of Bull Run, Second Battle of Manassas

With the above approach, finding a name requires searching two different columns. To address that, some databases list everything in one column:

Name Date

------------------------- -------------

Bull Run I July 21, 1861

Bull Run II July 29, 1862

Falling Waters July 02, 1861

First Battle of Bull Run July 21, 1861

First Battle of Manassas July 21, 1861

Hoke's Run July 02, 1861

Manassas I July 21, 1861

Manassas II July 29, 1862

Second Battle of Bull Run July 29, 1862

Second Battle of Manassas July 29, 1862

Although every name can now be found with a simple search in one column, there's a lot of redundant date listings (which would be even worse when including additional columns like casualties), and it isn't obvious that two different rows are the same event.

Our recommendation as the best solution is to do this:

Name Date Casualties Instead see

------------------------- ---- ---------- -----------

Bull Run I July 21, 1861 3,461

Bull Run II July 29, 1862 19,307

Falling Waters ............. .......... Hoke's Run

First Battle of Bull Run ............. .......... Bull Run I

First Battle of Manassas ............. .......... Bull Run I

Hoke's Run July 02, 1861 114

Manassas I ............. .......... Bull Run I

Manassas II ............. .......... Bull Run II

Second Battle of Bull Run ............. .......... Bull Run II

Second Battle of Manassas ............. .......... Bull Run II

Now every name can be found by searching only one column, additional data (like casualties) only needs to be added to a single row, and there's no confusing two names as different events.

The rule that every alternate name has an "Instead see" entry pointing to primary names should also be applied to cities, campaigns, generals, etc. Yes, you end up with a lot of primary and alternate columns, but computers are good at that, and the resulting consistency, flexibility, and ease of use is well worth the verbose structure.

3. Killed and wounded are obviously "casualties", but are missing and captured "casualties" or "losses"? Unfortunately, lists are inconsistent in what gets counted as which, resulting in lots of numbers that can't easily be reconciled. Any list should always clearly announce if missing/captured are counted as casualties or not. Being as detailed and succinct as possible is best:

Casualties Other losses

-------------------- ----------------------

Event Killed Wounded Total Missing Captured Total Grand total

-------- ------ ------- ----- ------- -------- ----- -----------

Battle 1 100 100 200 100 100 300

Battle 2 200 200 400 200 200 600

Are the blank entries above missing or zero? Again, databases are inconsistent. They should always explicitly document that blank entries are unknown, and zero entries are known, for example:


Casualties Other losses

-------------------- ----------------------

Event Killed Wounded Total Missing Captured Total Grand total

-------- ------ ------- ----- ------- -------- ----- -----------

Battle 1 100 100 200 100 0 100 300

Battle 2 200 200 400 200 200 600

which shows there were zero captured in Battle 1 but an unknown number of missing in Battle 2.

4. Last but not least, almost every statistic seems to have different claimed values. The number of troops, number of casualties, etc., will often be a vague estimate, and different sources will claim different numbers. Sometimes a value is sufficiently documented to be considered irrefutable, but that is fairly rare, and often "documented" numbers still disagree.

master.xls took the easy way out with the quick and dirty method of combining unreconcilable differences into entries like "100 or 200", but that prevents calculations such as computing totals.

Perhaps a solution is to properly treat every claimed number as either an "estimate" or a documented "known", and when more than one estimate exists, show a low-high range:

Killed

--------------

Event Low Known High

-------- --- ----- ----

Battle 1 100 150 <-- only a range of estimates is available

Battle 2 200 <-- documented, considered accurate

Battle 3 300 <-- not a range, still only an estimate (could instead be entered as "Low")

It starts to get messy when trying to consolidate ranges like the above with multiple columns like the casualties vs. losses columns in suggestion #3, and still leaves the problem of how to list multiple "knowns" claimed by different documents/sources (perhaps they should then be treated as estimates only?), but the more a database gives attention to these kinds of details the more respect and usefulness that database will have.