Research Behind the Tool

Flesch Score

The Flesch score is a score based on the syllables in a word and the length of sentences. Complex sentences with complex words are much harder to read. The tool will simplify this down to Good, Challenging or Difficult.

It will also let you know if a document has Long sentences. (Anything above 25 words per sentence. The UK Government has a great doc explaining this.).

It is not a perfect implementation, because the algorithms used rely on the use of fullstops and its not quite perfect when it comes to syllable counts, but it is certainly good enough to give you a good idea of how difficult it might be to read. With that said, it can be misleading for some poetry where line breaks are used as a form of punctuation.

You can open ▶ to see more details. You know you want to see those hidden stats!

Vocabulary Difficulty

These tools will check the vocabulary in your document and show you where your words lie on this scale. The Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR) has further details.

Beginner (A1)

  • Can understand and use familiar everyday expressions and very basic phrases aimed at the satisfaction of needs of a concrete type.

  • Can introduce themselves and others and can ask and answer questions about personal details such as where they live, people they know and things they have.

  • Can interact in a simple way provided the other person talks slowly and clearly and is prepared to help.

Elementary (A2)

  • Can understand sentences and frequently used expressions related to areas of most immediate relevance (e.g. very basic personal and family information, shopping, local geography, employment).

  • Can communicate in simple and routine tasks requiring a simple and direct exchange of information on familiar and routine matters.

  • Can describe in simple terms aspects of their background, immediate environment and matters in areas of immediate need.

Intermediate (B1)

  • Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc.

  • Can deal with most situations likely to arise while travelling in an area where the language is spoken.

  • Can produce simple connected text on topics that are familiar or of personal interest.

  • Can describe experiences and events, dreams, hopes and ambitions and briefly give reasons and explanations for opinions and plans.

Upper intermediate (B2)

  • Can understand the main ideas of complex text on both concrete and abstract topics, including technical discussions in their field of specialization.

  • Can interact with a degree of fluency and spontaneity that makes regular interaction with native speakers quite possible without strain for either party.

Can produce clear, detailed text on a wide range of subjects and explain a viewpoint on a topical issue giving the advantages and disadvantages of various options.

Difficult Words (Low frequency)

These are words that only proficient users would be expected to understand unless they have been directly taught them.

Reading Age

Two types of reading age are available, the Coleman–Liau Reading Age which is a good overall test but is not context aware. The other one is a variation on the Dale-Chall Readability index which really provides the gold standard for reading ages and because it's based on difficult vocabulary you can see how your changes to your document will bring the reading age down. It's actually quite shocking how most texts really require quite high reading levels. Both tests are based on US research and work best for students 11 and above (secondary in the UK). For younger readers it is best to keep the words as much possible to intermediate and elementary words. See Reading Age page for more of a breakdown.

Reading Time

These times are based on the meta analysis that Marc Brysbaert did. Reading analyses are based on wpm (a word is 5 characters), the average grade 4 child will read at 150 wpm (2.5 words per second) if they are familiar with the vocabulary. (A young adult will read at 180wpm)

High Frequency Word List

You can see the word lists being used. These are based on the research that I did for Taylor's University. They've been given an update for 2020 and are now in the app. I also added common brands, because they are often loanwords and who has not heard of McDonalds or Google?

Ngram

An n-gram in linguistics is a contiguous sequence of n items from a given sample of text or speech. We use Google's Ngram viewer in the program to give you an idea of how popular a word is. Not only does it show the popularity of a word, but it also shows how its popularity has changed over time. So in the case of photostat it reached a high in 1955 and then steeply declined thereafter, just as photocopy began to take off. They meet in 1966 and from then on photocopy dominates the use of the word.

Here are a few other examples:

  • the - 6-4%

  • game - 0.08 - 0.04%

  • football - High of 0.0017%

  • law - 0.04 - 0.02% (Notice the skew here, because Google's corpus includes a lot of statutes)

  • Linguistics - High of 0.0003%

Other downsides is that there is a huge mixture of fiction, nonfiction, reports, proceedings and a lot of scientific literature and none of these are weighted by popularity. Read this Wired article for more details.