11/24/2014 09:00 am ET Updated Dec 06, 2017

Why We Can't Ignore The Outliers

water for people

Surveying and mapping data calls attention to "outliers" -- like water source points where women spend unusually long periods each day collecting water

Chrissy, a young woman in her late 20s, opened her eyes and looked outside the smallish window of her mud-hut. The sky is stained with the hint of the coming morning, animated with the sounds of grasshoppers serenading the sun.

It's another day in Kashoni village, as the daily routine resumes.

She gets up, wipes the sleep from her eyes, quietly gathers the metal buckets and tiptoes out so as not to wake her children and husband. With the buckets dangling at her sides, she joins the other women, chattering about the latest gossip, as they stream towards the local borehole to draw water while the village sleeps. They settle under a tree, waiting their turn, and Chrissy, first in line, places her bucket under the pump outlet. After several strong strokes at the pump handle, joins the others to continue their gossip as they all wait for the water to fill the bucket.
They wait. And they wait. And they wait as the bucket fills up, one little drop at a time.

water for people 2

Chrissy has been doing this every day for six years. But unbeknown to her, this is more than just a routine for her and thousands of Malawian women. Chrissy is a statistical anomaly embedded within thousands of lines of data on a cluttered spreadsheet displayed on an old Toshiba laptop some 75 kilometers north of her home.

There is a word to describe her -- she is what mathematicians and statisticians call an outlier:

"An outlying observation that appears to be markedly different from other observations of the same sample being observed."

Outliers are special because they are rare and out of the ordinary. But what exactly was so special and extraordinary about Chrissy?

In July of this year, Chrissy was one of more than two thousand respondents that participated in a water-systems survey using Water For People's data collection app, AkvoFLOW, in an effort to understand community access to specific water points to track progress toward achieving the Millennium Development Goals. The app allows us to capture a variety of parameters, one of which is water quantity. Before we can settle for running water, we need to understand whether the water point even supplies enough water.

The survey required field enumerators to accurately measure how long it took to fill a 20-liter bucket/jerry can with water. Then, using simple arithmetic, the yield (expressed in liters per second) is calculated and compared with the local government standards to isolate any overtly worrisome results for possible future intervention by government or NGOs.

Chrissy's water point was one of those particularly worrisome results that stuck out like a sore thumb among the many rows and columns of data. As standard procedure, I had the unenviable task of cleaning the raw data spreadsheet once it was downloaded online. I reviewed 2000 lines of data looking for anything that was clearly inaccurate or just plain weird. Any responses that seemed out of the ordinary I simply deleted and marked as "Outlier" in my clean-up report, including Chrissy's.

So just how weird was Chrissy's data? It turned out that the average water point took about 99 seconds, or just over a minute and a half, to fill a 20 liter jerry can -- Chrissy's took 7200 seconds, or about two hours.

I approached the enumerator who had interviewed Chrissy to validate the data. What ensued was an impassioned battle of conviction between a field enumerator and his survey supervisor - each unmoved in his own sense of rightness. The enumerator was so adamant that the data was correct, and he even recalled waiting 30 minutes at the water point after Chrissy pumped with very little water discharge. He ended up asking Chrissy how long it usually took to fill a bucket and she told him two hours.

So with support from local extension workers, we conducted a small field verification exercise to confirm the enumerator's argument. Simply put, the outlier turned out to be accurate.

I wonder how many Chrissys are out there that we don't notice because they don't fit within our definition of "statistically accurate" or the nominal range of normal distribution curve. It scares me, but excites me, too. It reminds me that monitoring and evaluation (M&E) is not some boring job about numbers, figures and percentiles -- it's about learning, relearning and unlearning. It's about finding the story behind the numbers and what they mean no matter how uncomfortable that story may be. And ultimately, it's an important step in the vision to ensure access to clean, continuous water for everyone.. not just today, but forever.

Because of this experience, Water For People is enhancing its field monitoring work to ensure more "outliers" - rare and out of the ordinary as they may be - are not ignored for the sake of statistical correctness.

As the last drop of water finally falls filling her bucket to the brim, Chrissy's village of Kashoni is typed into another spreadsheet: villages tagged for rehabilitation in 2015.

More is More: Learn more about the value of monitoring and evaluation and the significant role it plays in providing access to clean water and sanitation.

Water For People is a partner of Cisco CSR. Cisco sponsors The Huffington Post's ImpactX section.