Digitisation

Digitisation is converting text from a pdf into readable text like in an Excel spreadsheet

These are some instructions for digitising data for a MARVEL project. Your group may use a slightly different approach (ask your tutor), but the general strategy is the same.

Identifying, Gathering, Cleaning and Formatting Experimental Data

Skill Development (For UCAS Personal Statements etc):

  • Searching and parsing large documents for relevant information
  • Using Optical Recognition Software (OCR) software
  • Time-managment
  • Team-work
  • Excel skills

Identifying sources of Data

Looking at the papers you found in your literature search your first task is to systematically go through each paper searching for the following:

  • Experimental Wavelengths (or wavenumbers)
  • Term Values
  • Molecular constants (these can be either theoretical or experimental).

A hint is to look at the tables and also abstracts which will tell you the main results of the paper. You can also use the Control-F function on the keyboard.

Now create a word file and in this copy & paste your BibTags from your literature search spreadsheets: these will be used as your headings. And then under each of them list out the sources of data in each respective paper.

Using online OCR tools to copy this data

So when you are faced with a paper which has tables of data which you want to transcribe you have a few options in how you can do this:

a. You can either do it manually (and the tedious) way of copying line-by-line

b. Try to select the data and copy it: Unfortunately you may well find with older paper this does not work as well.

c. OCR tools: OCR stands for “Optical Character Recognition”.

http://www.onlineocr.net/

I am now going to work through an example of howto use this online tool to extract data from a paper.

The first thing you will need to do is to identify which page of the pdf the data which you wish to copy is on: you then to “print” this page.

You then upload this to the OCR software and download the word file it produces. You can then copy direct from this word-file into ExCel.

A word of warning: when you copy into ExCel you may need to use paste-special so that data is not inputted into a single column.

As you go through make sure you save all files and call your file names sensibly so that we can back-track if we ever need to.