Parallel Proccessing

For a few of my answers on Web Applications, I’m constantly busy finding a solution to retrieve all information via the GmailApp.search() method. As explained in the text, this method is only usable for small amounts of threads. If you query a label containing 1000 emails for example, you will only get 500 items back. The GmailApp.search(query, start, max) method is meant to programmatically retrieve the information you want. I tried to build a script that had the following line-up:

var email;
while(email.length !==0) {
  try{
    email = do stuff
  } catch (e) {
    max reached, write to Google Drive or ScriptProperties and create trigger
  } finally {
    round up of all data
  }

At first it worked, but became way too complex in creating triggers, writing to Google Drive, hitting execution time. It's a shame, as I've seen not many examples that use the finally clause in the try/catch and it just looks nice.

In the meantime, Bruce McPherson, was hitting the road with his awesome posts, starting with the Running things in parallel using HTML service. This post (see slide 14) on the Google+ Google Apps Script community made me think of a way to incorporate some sort of while loop. 
Basically, I'm re-starting the script to continue working on retrieving the information untill all information is fetched. Read below to find out how I did that. Make sure to catch up on all the posts Bruce wrote on this topic, to understand what I'm doing.

Code.gs

The beginning of the code has some extra global vars. These are passed on to the profiles that need to be created. The rest is the same.


var ADDONNAME = "async";
var CHUNK = 5;
var MAX = 500;
var QUERY = 'all';

....

function showSidebar() {
  // kicking off the sidebar executes the orchestration
  var startTime = new Date().getTime();
  var startOpt = {"chunk": CHUNK, "cycle": 0, "max": MAX, "starttime": startTime, "query": QUERY};
  libSidebar('asyncService',ADDONNAME, gmailProfile(startOpt)); 
}

Profiles.gs 

Now we jump all the way to the profiles.gs because the rest is all the same, as Bruce has explained in his posts. This file contains 4 profiles. The first (currCycle) will merely show the current cycle the process is in. 

// create cycle counter
var cycleCounter = [];
cycleCounter.push({
  "name": 'Cycle number: ' + opt.cycle,
  "functionName": 'currCycle',
  "options": {}
});

The next profile will start retrieving the items based on the getAllEmails function. The global vars are being passed on yet again through the options.

// create count loop
var countThreads = [];
for(var i = 0; i < opt.chunk; i++) {
  var counter = i + (opt.chunk * opt.cycle);
  countThreads.push({
    "name": 'Gmail count: ' + counter,
    "functionName":'getAllEmails',
    "options": {
      "query": 'label: ' + opt.query,
      "start": counter * opt.max,
      "max": opt.max
    }
  });
}

The reduceTheResults function will collect the information obtained from the previous profile. 

// next reduce the test data to one
var threadsReduction = [];
threadsReduction.push({
  "name": 'Reduction: ',
  "functionName":'reduceTheResults',
  "options": {}
});

The last profile is the most important one. It functions as a logger (for the spreadsheet) but will also stir the process, via the options

// finally log the results
var profileLog = [];
profileLog.push({
  "name": 'Logging: ',
  "functionName": 'logTheResults',
  "skip": false,
  "options": {
    "cycle": opt.cycle,
    "sum": opt.sum,
    "starttime": opt.starttime,
    "driver": 'cDriverSheet',
    "clear": false,
    "parameters": {
      "siloid": 'Sheet1',
      "dbid": 'id of your Google Spreadsheet',
      "peanut": 'jacobjan'
    }
  }
});

Process.gs 

The first function simply returns the current circle number, but it will show up this way in the side bar.

function currCycle(curCycle) {
  return curCycle;
}

The second function returns the length of an array (count) based on the options.

function getAllEmails(options) {
  var threads = GmailApp.search(options.query, options.start, options.max);
  return [threads.length];
}

The last function will validate and prepare the results. If the last index of the array is not zero (no emails returned), then restart the side bar.

function logTheResults(options, reduceResults) {
  // prepare handler  
  var handler = new cDbAbstraction.DbAbstraction ( eval(options.driver),  options.parameters ); 
  assert(handler.isHappy(), 'unable to get handler',options.driver);
  
  // clear sheet if needed
  if (options.clear) {
    var result = handler.remove();
    if (result.handleCode < 0) {
      throw result.handleError;
    }
  }  
  
  // prepare data
  var data = reduceResults[0].results;
  var intSum = data.reduce(function(a, b) { return a + b; });
  if(options.cycle === 0) {
    options.sum = 0;
  }
  var cumSum = intSum + options.sum;
  
  // set options   
  var opt = {
    "sum": cumSum,
    "starttime": options.starttime,
    "max": MAX,
    "query": QUERY,
    "chunk": CHUNK 
  };  
  
  if(data.lastIndexOf(0) !== data.length - 1) {
    opt.cycle = options.cycle + 1;
    libSidebar('asyncService', ADDONNAME, gmailProfile(opt));
  } else {
    var totalTime = Math.round((new Date().getTime() - opt.starttime) / 1000);
    result = handler.save({"total count": cumSum, "total time (s)": totalTime});
    if (handler.handleCode < 0) { 
      throw JSON.stringify(result);
    }
  }
}

Notes

In this case I've passed on the results as well, as part of the summation, but I could've used any of the drivers Bruce made available. 

Screenshots

 summary output
 result output
 summary output  result output

See the two sub pages for the complete code.


Subpages (2): Process Profiles