THE BLOG

Big Data Farm

02/10/2015 09:30 am ET | Updated Apr 12, 2015

Welcome to the wonderful world of Big Data! Big Data is big! Really, really, really big! Major corporations like Google, Facebook, and Twitter are already using Big Data to control, we mean improve, our lives. This tutorial will teach you all you need to know to get in on the fun and profit.

First, some terminology. The smallest unit of Big Data is called a NEEDLE.

NEEDLES are stored in data clusters called HAYSTACKS. A single HAYSTACK may contain an infinite number of NEEDLES.

Note: HAYSTACKS can only be instantiated intraday. For example, to create a new HAYSTACK, enter the following command:

MAKE HAYSTACK haystack_name WHILE SUN SHINES

HAYSTACKS run on SERVER FARMS. There are two sizes of SERVER FARMS: SMALL FAMILY SERVER FARMS and BIG INDUSTRIAL SERVER FARMS.

Note: SMALL FAMILY SERVER FARMS are being rapidly deprecated. We strongly recommend using only BIG INDUSTRIAL SERVER FARMS.

SERVER FARMS are managed by processes called FARMERS. Due to a technical limitation, FARMERS can only run on Dell (TM) computers. For example, to instantiate three FARMER processes, enter the following commands:

FARMER farmer_name1 IN THE DELL
FARMER farmer_name2 IN THE DELL
/* hi ho the merry oh */
FARMER farmer_name3 IN THE DELL

Note: /* hi ho the merry oh */ is a comment, and has been included only for clarity; it is not mandatory.

Now some common use cases. To find a specific NEEDLE in a HAYSTACK, enter the following command:

SEARCH NEEDLE needle_name IN HAYSTACK haystack_name ON SERVER FARM farm_name

The FARMER processes you previously created would search that specific HAYSTACK on that specific SERVER FARM for that specific NEEDLE.

But suppose you didn't know the specific HAYSTACK and SERVER FARM. You could enter the following command:

SEARCH NEEDLE needle_name IN HAYSTACK ALL ON SERVER FARM ALL

But, depending upon the number of HAYSTACKS and SERVER FARMS in your implementation, this query could take a long time to run, and you may forget what you wanted NEEDLE needle_name for in the first place.

To run a more efficient query, first you must create supporting processes called MILKMAIDS. For example, to instantiate eight MILKMAID processes, enter the following commands:

CREATE MAID maid_name1 A-MILKING
CREATE MAID maid_name2 A-MILKING
.
.
.
CREATE MAID maid_name8 A-MILKING

Note: Similar commands can be used to create SWANS A-SWIMMING, GEESE A-LAYING, etc., but that is beyond the scope of this tutorial.

Now that you have FARMERS and MILKMAIDS, you can link them in a special search algorithm called ROLLING:

ROLL FARMER farmer_name1 MILKMAID maid_name1 IN HAYSTACK ALL ON SERVER FARM ALL

But this is still less than optimal. You don't have to link a specific FARMER to a specific MILKMAID; instead, we recommend using wild cards in WILD OATS mode:

WILD OATS ROLL FARMER ALL MILKMAID ALL IN HAYSTACK ALL ON SERVER FARM ALL
/* hi ho the merry oh */

If any of the FARMER-MILKMAID permutations encounter a NEEDLE in a HAYSTACK while ROLLING, it will throw an exception called an OUCH. You can then inspect the NEEDLE to see if it is the one you want. If not, you can discard it and the FARMERS and MILKMAIDS will continue ROLLING.

Note: After ROLLING in the HAYSTACKS, FARMERS and MILKMAIDS may spawn CHILD processes. This is perfectly natural, and may even be beneficial; the CHILD processes will grow up eventually to be FARMERS or MILKMAIDS, and perform their own ROLLING.

Once you have found the NEEDLE you are searching for, what can you do with it? Another common use case is to apply a security patch. First, you will need to create a processing thread for the NEEDLE:

MILKMAID maid_name THREAD NEEDLE needle_name
/* why is this my job? I've got enough CHILD processes to worry about */

Next, you will need to apply the security patch to the FARMER'S container, called OVERALLS:

MILKMAID maid_name STICH FARMER farmer_name OVERALLS WITH NEEDLE needle_name
/* why don't you stitch your own damn overalls? */

Note: A STICH is a special memory device that saves nine times as much data when applied in a timely fashion.

Finally, you must disconnect the processing thread from the NEEDLE:

MILKMAID maid_name BYTE THREAD NEEDLE needle_name
/* you're sleeping in the barn by yourself tonight, mister */

There you have it! Now you know all you need to know to enter the wonderful world of Big Data! Good luck acing that job interview with the NSA!