Measuring Knowledge
The bucket, not the contents
There is no intrinsic definition of knowledge. There is no unit for knowledge. There is no way to empirically measure it. There is no independent yardstick against which we can empirically quantify knowledge in DNA, in a person, in a piece of machinery, in a book, or in a computer program.
This presents us with problems.
As we have seen earlier, the only way we can qualify knowledge (assert it is “correct”) is to compare it against another source of knowledge (which, of course, would in its turn need to be compared against yet another source of knowledge). The same is also true of quantifying knowledge. But quantification is arguably even less well-defined than qualification—we are usually better able to say “…this knowledge is ‘correct’...” than we are to say “…this knowledge is 'enough'…”
Both qualification and quantification of knowledge in a person (and other media) are usually achieved through some examination process. After undergoing some standardized test under controlled circumstances, a person’s accessible knowledge is assessed by comparing it against some standardized knowledge store. This comparison purports to assess both the quality (how close the answers come to the “correct” answers) and the quantity (how many answers came close to the “correct” answers). In highly standardized tests, these are often congruent: for a well-defined multiple-choice test, any answer that does not match the “correct” answer is considered to be simply wrong and doesn’t count toward the quantity total. For essay-type questions, the quality and quantity might be somewhat decoupled: the person might get some of the ideas “correct” but not all of them. This is where the professor’s skill (and knowledge base and biases) in marking exam results comes in.
Notice, I referenced “correct” in quotes. An answer is deemed correct only insofar as it matches the “correct” answer, that being determined by the comparative knowledge source and comparison process. Since the “correct” answer was itself compared against something else, which in turn was compared against something else, ad infinitum, there is no point at which empirical correctness can be truly asserted (or, in most cases, proven [1]).
So, we cannot measure knowledge directly...
...but we can measure the physical characteristics of its substrate.
Weighing in on Books
As noted, there is no way for us to calculate how much knowledge is in, say, a book. Equally, there is no empirical way to determine the "quality" or correctness of a book. But we can weigh the book. We can count the number of chapters, pages, and words. We could even use lexical analysis to render some opinion on the depth of complexity in the book. Each of these metrics, and many more, can be considered to be indicators of the quantity and quality of knowledge in a book.
If we have two books, each on the same subject, written in similar styles level of comprehension, and complexity, by equally accredited authors, and one book is twice the weight, has twice the number of chapters, pages, and words than the other, it is not unreasonable to assert that the bigger, heavier book has around twice the amount of knowledge. Right?
Probably not. The number of caveats throw up so much noisy qualifications, that such a calculation is almost certainly a long way off. Also, of course, there is no (other) way to "prove" that the 2x relationship pertains. In the context of two books, this sounds almost silly. However, in the field of software development, project management and project estimation, such calculations are routine. Rather than textual words, people count "lines of code" [2] to deduce the likely effort, staffing, and cost or a project.
FOOTNOTES
[1] That said, most mathematical and many philosophical systems start with an axiomatic basis. Axioms, by definition, are considered to be self-evidently true and do not require proof. Indeed, true axioms cannot usually be “proven” (against what?) This means such systems are built on things which we realize cannot be asserted to be intrinsically “correct.” But we have to start somewhere, right? From such a basis, it is entirely appropriate to assert that well-formed theorems built on the axiomatic basis are, in fact, proven. We will talk about this later.
However, the universally accepted practice of building enormous, elaborate logic structures based on an acknowledged unprovable foundation is further evidence of the non-measurability of knowledge.
[2] More correctly, the lines of code count is a prediction (calculation, guess, wish,...) of the likely final number of lines of pre-compiled code that will end up in a finished system. It is the profession's best attempt to assess the quantity, if not quality, of knowledge in the system that will be finally delivered to a user or customer. This number is usually used in a formula that determines what effort/staffing/resources would be required to produce the system.
There are many arguments against this approach, the most pertinent being that the effort involved in producing a system is mostly dependent on knowledge the developers don't have. The approach is used mostly because there really isn't any better approach.