01/08/2016 02:00 pm ET Updated Dec 06, 2017

Can Data Aggregation Save Sharks?


In the fall of 2013 there seemed to be a higher incidence of shark sightings at my local surf spot than there had been in several years. The local news showed up several times over the course of a few months to cover the sightings in the typical "scare the public" fashion in my opinion. Also during this time, more news was surfacing about Australia. Their government was actively killing sharks along their western coast if they were over a certain size in the name of public safety. Frustrated, I started an online presence that was dedicated to shark education and conservation. T-shirts and stickers were sold with the branding that I created and I donated a portion of the sales to a shark oriented charity. The presence grew, but I was still frustrated with the hype and the killing and really wanted to do more.

There exists a website that is run by a prominent shark researcher that has shark sighting data along the Pacific coast from 2003 to the present. Nearly 65% of these sightings comes from surfers and stand-up paddleboarders, which makes sense since we are in the water more often and for longer periods of time per visit than the average person. The data does have some bias but it is usable - nearly every shark sighted that someone reports the type is reported as a white shark. While this may or may not be the case, most of the reports from those who spend so much time in the water are accurately shark sightings. Sharks move and behave much differently than do dolphins or other inhabitants of the ocean and it doesn't take long to understand the difference.

I contacted the researcher, Ralph Collier, and asked him if he had the data somewhere in a database that appeared on the website. He did not, unfortunately. First, I tried to use natural language processing to get the data in a usable form. It didn't work. The nature of the data was so that you really need to know a bit about the California coast and ocean/surf culture. The code that I would have had to write to get the data into a usable form would be immense and cumbersome. I did the only thing I could do and started going through each sighting and manually entering the data into a spreadsheet. Yeah, it's a long process. However, I have learned quite a bit by becoming so intimate with this data set.

In addition to Ralph's data, I found other great sources of sighting data. Whale watching boats, fishing boats, Instagram, YouTube and news reports online have a wealth of information. Some of these have APIs that can be tapped into and other websites are easily scraped with Python. Together, they provide a nice picture of nearshore and offshore sightings. Of course, there is quite a bit of quality control that is happening along the way. A sighting may be reported on Ralph's website, social media, and the local news website so it's important that this is only counted as one sighting and not more.

What's the point of all of this data collection? I want to create a clearer picture of shark activity in order to help non-profits, fisheries, scientists, and government agencies manage their conservation. In addition, I am deeply passionate about educating the public on why sharks are very important to our ecosystem and economy. For example, a study showed that loss of shark populations increased the ray populations. The rays then ate all of the bay scallops and a fishery was forced to close. When the scallops were gone the rays started feeding on other bivalve species.

Will data aggregation save sharks? We can't be sure for some time. First the California data needs to be finalized and the model validated. After this, I will be after global data from a number of sources. The more sources, the better. In the meantime you can follow the project's progress here.