Drinking from the (data) Firehose of Terror

Between classic business transactions and social interactions and machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is everlasting. Which is why you see a lot of things in the loop around real time frameworks and streaming frameworks. – Mike Hoskins, CTO Actian

From Mike Hoskins to Mike Richards (yes we can do that kind of leap in logic, it’s the weekend)…

Oh, Joel Miller, you just found the marble in the oatmeal! You’re a lucky, lucky, lucky little boy – because you know why? You get to drink from… the firehose! Okay, ready? Open wide! – Stanley Spadowski, UHF

Firehose of Terror

I think you get the picture – a potentially frightening picture for those unprepared to handle the torrent of data that is coming down the pipe. Unfortunately, for those who are unprepared, the disaster will not merely overwhelm them. Quite the contrary – I believe they will be consumed by irrelevancy.

If you’re still with me, let me explain.

I agree that the tap has been turned on, maybe not at the full blast of power or under maximum control, yet the data is coming and already well beyond a trickle. The helter-skelter implementations of big data solutions out there has, perhaps, created more of a turbulent blasting firehose than a meandering stream of flowing data. And this is only the beginning.

Success is Still Newsworthy

We are still at the stage where designing a (successful) enterprise system built around streaming data, for example, is big news. Why is it news? Because building is harder than merely planning, especially when new open source projects continue to push us beyond the bleeding edge. New tools are helping us see how we can handle more data and find more value, but they are also making the pool of data so much larger that the tools themselves are often irrelevant by the time they are adopted.

For example, MapReduce was awesome, until it began to be so widely adopted that its limitations became apparent. It’s like finding the marble in a sandbox filled with oatmeal – not easy, but when you find it, you’re a winner! Oh, the prize is a fierce typhoon of even more data coming your way. Congratulations! (Sorry you didn’t prepare for that.)

So where does this leave organisations that have no ability to handle more than a trickle of data?

It’s a win or lose scenario – either you can do something about it or you can’t. As software developers or data managers we won’t be judged along some smooth gradation of skills and capabilities.

We’ll be judged against a checklist
Yes or no.
Pass or fail.
Win or lose.
Firehose or … an icky pail to hide in the closet.

Data’s Need for Speed

Why is it a pass/fail scenario? Consider your car – is it successful when it mostly starts in the morning? Never. Anything beyond fully starting is a complete failure because that is what it is designed to do.

I argue that today’s data streams are being designed to handle data at maximum velocity. Sure many services aren’t producing millions of records per second, but as we gear up with the latest toolsets, we make the tools themselves hunger and thirst to get more and more data into their greedy little hands. Feed the beast – or ignore it at your peril.

Systems today are designed to run at full throttle – 100% – all out – maximum overdrive. None of us would like to have pay for premium broadband and find that we only use 10% of our bandwidth. Likewise, our systems are waiting for us to crank up the volume to see what we’ll do next.

Our data economy inherently wants to run at maximum but much of the old plumbing needs upgrades to function at that rate. You’d be irate if the fire department ran their hoses at 10% power when trying to douse your burning home. However, if they told you later that max pressure would burst the old hoses and you’d have no water, then you might be a little more appreciative.

Patch up those hoses so they are tested and ready. Buckle up the survival suit. Start digging through the oatmeal and open wide. Be forewarned – if you are not searching for the marble, you can never find it. If cannot find it, you’ll be sent home empty handed.

—-

p.s. If you don’t win, I highly doubt you’ll even receive a lousy copy of the home edition.

About the Author
Latest Posts

About Tyler Mitchell

Director Product Marketing @ OmniSci.com GPU-accelerate data analytics | Sr. Product Manager @ Couchbase.com - next generation Data Platform for System of Engagement! Former Eng. Director @Actian.com, author and technology writer in NoSQL, big data, graph analytics, geospatial and Internet of Things. Follow me @1tylermitchell or get my book from http://locatepress.com/.

Geography + Data - July 15, 2021
DIY Battery – Weekend Project – Aluminum + Bleach? - January 17, 2021
It’s all about the ecosystem – build and nurture yours - May 1, 2020
Learnings from TigerGraph and Expero webinar - April 1, 2020
4 Webinars This Week – GPU, 5G, graph analytics, cloud - March 30, 2020
Diving into #NoSQL from the SQL Empire … - February 28, 2017
VID: Solving Performance Problems on Hadoop - July 5, 2016
Storing Zeppelin Notebooks in AWS S3 Buckets - June 7, 2016
VirtualBox extension pack update on OS X - April 11, 2016
Zeppelin Notebook Quick Start on OSX v0.5.6 - April 4, 2016

Drinking from the (data) Firehose of Terror

Firehose of Terror

Success is Still Newsworthy

Data’s Need for Speed

About Tyler Mitchell

Similar posts

VID: Solving Performance Problems on Hadoop

Storing Zeppelin Notebooks in AWS S3 Buckets

Zeppelin Notebook Quick Start on OSX v0.5.6