Technical Blog‎ > ‎

Reading Data from Sheets via SpreadsheetApp: Best Practice

posted Apr 1, 2015, 7:50 PM by astillman@newvisions.org   [ updated Apr 9, 2015, 5:44 AM ]

There are two mental models for handling data in Sheets, the first of which -- what we'll call "reading in values as an Array of Arrays" -- is native to Apps Script's SpreadsheetApp service, and despite being the most obvious turns out to have some major limitations that can be solved by the second method -- which we'll call "reading in values as an Array of Object-literals."   


The first method is as "simple" as getting a range object type and then calling .getValues() on it, which to the newbie might not feel so simple the first time through... so let's break it down:


read1.png


Spreadsheets are organized in a coordinate grid, whose cell values are naturally understood in a computer language as a list of ordered lists… e.g. where list1 is [‘First Name’,’Last Name’,’Email Address’] and list2 is [‘Joe’,’Smith’,’test1@youpd.org’], etc.


read2.png


Calling forth the value “Jane” from the values array involves accessing the correct array (values[2]) in the list, and then the correct value within that list… e.g.  values[2][0] where the 2 and the 0 are just the integer indices that give the location of a given value within a list. In this case we have to access a value in an array that is inside another array, hence the two sets of square brackets.


If the range in question is a single cell, range.getValue() will return a single value -- the value in the cell -- not as an array but as a single value. If the range spans multiple cells, range.getValues() returns "an Array of Arrays," or a list of lists that represent the rows in the range.


Notice in the debugger panel screenshotted here that ss, sheet, and range are all returned as type “Object” and that there are no visible values shown for them.  This is because SpreadsheetApp.getActiveSpreadsheet() returns a specially-defined Spreadsheet type object, or class, that (like a sheet, or a range) serves largely as a container for of a bunch of private (visible to Google, but not to you) properties and a handful of useful public methods.   Public methods, like getValues(), are functions that extend a particular class (in this case the Range class), and that typically return another type of object, that may have its own methods and properties.  When working with Google's Apps Script services, realize that some of these return types have primitive values, whereas others are just "black boxes" that merely have other methods attached to them.  For example, there are several ways of obtaining a range object from a sheet object.


getRange.png


From the options above, seasoned spreadsheet users will gravitate toward the sheet.getRange(a1Notation) approach, where the argument passed into .getRange looks like "Sheet-o-Fun!A2:D300" but this quickly proves itself to be a very limited method.  E.g. sheet names and range size often change, so hard-coding them is only workable in the world of duct-taped, temporary solutions. Dealing with ranges more robustly requires bigger guns.  


When working with rectangular data sets in sheets that contain headers in the first row, a frequently-used practice on our team is to grab the headers in a single, one-dimensional array for later re-use, and then the rest of the values in the sheet:


var headers = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0];


var values = sheet.getRange(2, 1, sheet.getLastRow()-1, sheet.getLastColumn()).getValues();


This makes use of getLastRow and getLastColumn methods to return integers that will always adjust to the current extent of the range, and “method chaining” wherein the range object type (returned by the getRange method) can then be acted upon by the getValues method. 


Notice how in retrieving the headers we call the [0]th element in the array returned by getValues.  This is because getValues() always returns a 2-D array, which in this case would be a list one element long that is an array three elements wide. To get to a simple 1D array of headers, we have to access the first element in this array via [0].


A more elegant, if somewhat harder to understand approach that makes use of the getDataRange method, which always returns a range corresponding to the largest extent of cells that has values.  The native Javascript array.shift() method here “plucks” the first row of values from the array, returning the header row and leaving behind only a 2D array of values.


var values = sheet.getDataRange().getValues();


var headers = values.shift();


With either of these, the values array will look like this upon inspection in the debugger:


[[‘Joe’,’Smith’,’test1@youpd.org’],[‘Jane’,’Doe’,’test2@youpd.org’],[‘Amanda’,’Jacobs’,’test3@youpd.org’]]


If we wanted to loop through this array of arrays and do something like send an email from MailApp service, we might construct a for loop whose index allows us to access and work with each row… e.g.


for (var i=0; i<values.length; i++) {

 var firstName = values[i][0];

 var lastName = values[i][1];

 var email = values[i][2];

 MailApp.sendEmail(email, “Hello “ + firstName + “!”, “Testing testing 123”);

}


This for loop calls forth the standard practice of using the length of the array as a termination condition, which means that it will grow to accomodate any length of data set in our sheet, however there’s still a major problem with accessing Sheets column fields purely using numeric array indices:  namely that any change of column order, or column insertions in the sheet will run amok on our firstName, lastName, and email assignments.


Problem: When users change column order in the sheet, or make column insertions or deletions, any code that uses a set of hard-coded assumptions about value order will potentially fail.


Solution #1:  Dynamically identify correct indices using the indexOf method on the sheet headers array.


One solution to this dilemma would be to use the native Javascript indexOf method on the headers array to dynamically identify the correct column index for each field prior to running the for loop.  In code:

var fnIndex = headers.indexOf(“First Name”);

var lnIndex = headers.indexOf(“Last Name”);

var emailIndex = headers.indexOf(“Email Address”);


for (var i=0; i<values.length; i++) {

 var firstName = values[i][fnIndex];

 var lastName = values[i][lnIndex ];

 var email = values[i][emailIndex];

 MailApp.sendEmail(email, “Hello “ + firstName + “!”, “Testing testing 123”);

}


The indexOf method returns the zero-indexed position of a value within a string or array.  When called on an array, it returns the array index that matches the passed in value, or -1 if the value is not found.  


Solution #2: Convert the sheets data into an array of Javascript object literals, and access row values using keys instead of array indices.


This second approach is truly where you want to be headed if you’re going to be doing a lot of work with data sets in Sheets, or writing a Sheets Add-on that is robust enough to handle lots of user eccentricity.


First, it might be worth an introduction to what are known as object literals in Javascript.  The declaration of an object literal representing a single row in the spreadsheet from this example might look like this:



var student1 = {firstName: “Jane”, lastName: “Doe”, emailAddress: “test1@youpd.org”};



Often, object literals are written in easier to read, or “prettified” notation like this:



var student1 = {  

                          firstName: “Jane”,

                          lastName: “Doe”,

                          emailAddress: “test1@youpd.org

                        };



What you see above is a set of “key-value pairs” where the values are all strings, though this needn’t be the case.  The values could just as easily be numbers, dates, or other objects, and importantly, it makes no difference in reading values which order the key-value pairs appear in the object.


The values in an object literal can be accessed much as they would be for an array, using a bracket notation:


var firstName = student1[‘firstName’];


...in this case would assign the value “Jane” to the variable firstName.


A handy dot notation is also available, provided the keys themselves are valid (or normalized) javascript names (e.g. no whitespaces, dots, special characters, or leading numbers).  Conventionally, Javascript key names are written in “camel case” to help improve the human readability of multi-word field names.


var firstName = student1.firstName;


And so ideally, what we want is for our Sheets data to be returned as an array of object-literals whose keys are camelCase versions of the column headings… turning this:


read1.png


… into this …


[{firstName : “Joe”, lastName : “Smith”, emailAddress : “test1@youpd.org”},

{firstName : “Jane”, lastName : “Doe”, emailAddress : “test2@youpd.org”},

{firstName : “Amanda”, lastName : “Jacobs”, emailAddress : “test3@youpd.org”}]


Which is really how many modern web and database technologies now handle data objects (e.g. as JSON-encoded).... so that we can loop through them like this …


for (var i=0; i<values.length; i++) {

 var dataRow = values[i];

 var emailAddress = dataRow.firstName;

 var firstName = dataRow.firstName;

 var lastName = dataRow.lastName;

 MailApp.sendEmail(email, “Hello “ + firstName + “!”, “Testing testing 123”);

}



So how can we easily pull values from a sheet as an array of object-literals with normalized keys?


Some early Google Apps Script tutorials -- inexplicably removed from the Apps Script documentation site -- contained a very helpful set of library functions for working with Sheets data in this manner, and the New Visions CloudLab team has modified these slightly to remove some bugs.  We use them so often that we’ve packaged them in our own internal library for quick re-use across many of our projects.  


For the purposes of reading sheets data into arrays of objects, the only function you’re likely to call directly in this set is getRowsData which depends on the other functions bundled here to normalize sheet headers into proper Javascript keys, etc.  Just beware that this method will fail if you are relying on leading numbers or special characters (e.g. 1. Name, 2. Name, etc.) to make your headers unique...


// getRowsData iterates row by row in the input range and returns an array of objects.

// Each object contains all the data for a given row, indexed by its normalized column name.

// Arguments:

//   - sheet: the sheet object that contains the data to be processed

//   - range: the exact range of cells where the data is stored

//       This argument is optional and it defaults to all the cells except those in the first row

//       or all the cells below columnHeadersRowIndex (if defined).

//   - columnHeadersRowIndex: specifies the row number where the column names are stored.

//       This argument is optional and it defaults to the row immediately above range;

// Returns an Array of objects.

/*

* @param {sheet} sheet with data to be pulled from.

* @param {range} range where the data is in the sheet, headers are above

* @param {row}

*/

function getRowsData(sheet, range, columnHeadersRowIndex) {

 if (sheet.getLastRow() < 2){

   return [];

 }

 var headersIndex = columnHeadersRowIndex || (range ? range.getRowIndex() - 1 : 1);

 var dataRange = range ||

   sheet.getRange(headersIndex+1, 1, sheet.getLastRow() - headersIndex, sheet.getLastColumn());

 var numColumns = dataRange.getLastColumn() - dataRange.getColumn() + 1;

 var headersRange = sheet.getRange(headersIndex, dataRange.getColumn(), 1, numColumns);

 var headers = headersRange.getValues()[0];

 return getObjects_(dataRange.getValues(), normalizeHeaders(headers));

}


// For every row of data in data, generates an object that contains the data. Names of

// object fields are defined in keys.

// Arguments:

//   - data: JavaScript 2d array

//   - keys: Array of Strings that define the property names for the objects to create

function getObjects_(data, keys) {

 var objects = [];

 var timeZone = Session.getScriptTimeZone();


 for (var i = 0; i < data.length; ++i) {

   var object = {};

   var hasData = false;

   for (var j = 0; j < data[i].length; ++j) {

     var cellData = data[i][j];

     if (isCellEmpty_(cellData)) {

       object[keys[j]] = '';

       continue;

     }

     object[keys[j]] = cellData;

     hasData = true;

   }

   if (hasData) {

     objects.push(object);

   }

 }

 return objects;

}



// Returns an Array of normalized Strings.

// Empty Strings are returned for all Strings that could not be successfully normalized.

// Arguments:

//   - headers: Array of Strings to normalize

function normalizeHeaders(headers) {

 var keys = [];

 for (var i = 0; i < headers.length; ++i) {

   keys.push(normalizeHeader(headers[i]));

 }

 return keys;

}


// Normalizes a string, by removing all alphanumeric characters and using mixed case

// to separate words. The output will always start with a lower case letter.

// This function is designed to produce JavaScript object property names.

// Arguments:

//   - header: string to normalize

// Examples:

//   "First Name" -> "firstName"

//   "Market Cap (millions) -> "marketCapMillions

//   "1 number at the beginning is ignored" -> "numberAtTheBeginningIsIgnored"

function normalizeHeader(header) {

 var key = "";

 var upperCase = false;

 for (var i = 0; i < header.length; ++i) {

   var letter = header[i];

   if (letter == " " && key.length > 0) {

     upperCase = true;

     continue;

   }

   if (!isAlnum_(letter)) {

     continue;

   }

   if (key.length == 0 && isDigit_(letter)) {

     continue; // first character must be a letter

   }

   if (upperCase) {

     upperCase = false;

     key += letter.toUpperCase();

   } else {

     key += letter.toLowerCase();

   }

 }

 return key;

}


// Returns true if the cell where cellData was read from is empty.

// Arguments:

//   - cellData: string

function isCellEmpty_(cellData) {

 return typeof(cellData) == "string" && cellData == "";

}


// Returns true if the character char is alphabetical, false otherwise.

function isAlnum_(char) {

 return char >= 'A' && char <= 'Z' ||

   char >= 'a' && char <= 'z' ||

   isDigit_(char);

}


// Returns true if the character char is a digit, false otherwise.

function isDigit_(char) {

 return char >= '0' && char <= '9';

}




And so, the best practice for handling this example would look like this:

var ss = SpreadsheetApp.getActiveSpreadsheet();

var sheet = ss.getSheetByName(‘Student Roster’);

var data = getRowsData(sheet);


for (var i=0; i<data.length; i++) {

 var emailAddress = data[i].firstName;

 var firstName = data[i].firstName;

 var lastName = data[i].lastName;

 MailApp.sendEmail(email, “Hello “ + firstName + “!”, “Testing testing 123”);

}



I view this is the best practice because it is immune to column-reordering, it adheres to an industry standard data representation (JSON), and is quick and easy to apply to any Sheets-based data set.   If you haven’t already, I recommend you create yourself an Apps Script library and add the getRowsData method to it!


Bonus: If you’re interested in a method that can write arrays of objects with normalized headers to a sheet, check out setRowsData!



// setRowsData fills in one row of data per object defined in the objects Array.

// For every Column, it checks if data objects define a value for it.

// Arguments:

//   - sheet: the Sheet Object where the data will be written

//   - objects: an Array of Objects, each of which contains data for a row

//   - optHeadersRange: a Range of cells where the column headers are defined. This

//     defaults to the entire first row in sheet.

//   - optFirstDataRowIndex: index of the first row where data should be written. This

//     defaults to the row immediately below the headers.


function setRowsData_nonNormalized(sheet, objects, optHeadersRange, optFirstDataRowIndex) {

 var headersRange = optHeadersRange || sheet.getRange(1, 1, 1, sheet.getMaxColumns());

 var firstDataRowIndex = optFirstDataRowIndex || headersRange.getRowIndex() + 1;

 var headers = headersRange.getValues()[0];

 var data = []

 

 for (var i = 0; i < objects.length; ++i) {

   var values = []

   for (var j = 0; j < headers.length; ++j) {

     var header = headers[j];


     // If the header is non-empty and the object value is 0...

     if ((header.length > 0)&&(objects[i][header] === 0)&&(!(isNaN(parseInt(objects[i][header]))))) {

       values.push(0);

     }

     // If the header is empty or the object value is empty...

     else if ((!(header.length > 0)) || (objects[i][header]=='') || (!objects[i][header])) {

       values.push('');

     }

     else {

       values.push(objects[i][header]);

     }

   }

   data.push(values);

 }


 var destinationRange = sheet.getRange(firstDataRowIndex, headersRange.getColumnIndex(),

                                       objects.length, headers.length);


 destinationRange.setValues(data);

}




Comments