The full video of my talk from Hadoop Summit (San Jose, June 28, 2016) is now available. In this talk I cover performance considerations when moving analytic workloads into production. I even give away the game changing secret sauce for extreme performance in Actian’s Vector in Hadoop product for SQL analytics. VID: Solving Performance Problems […]
You are browsing archives for
Tag: hadoop
Serverspec checks settings on a Hadoop cluster
Serverspec is a Ruby-based system that can run Rspec formatted tests against a host. It can check for a long list of system details and status such as total memory, CPU count, running services, and more, including custom shell callouts. It compares the results of the checks with predefined thresholds and reports a pass or […]
Hadoop Options for SQL Databases
Drowning while trying to understand your options for SQL-based database management in Hadoop? This graphic is a simplified comparison of the various features of several popular products being used today. I outline some of my biggest differentiators in this post. While this is a marketing slide for Actian’s SQL in Hadoop enterprise solution, I wish I saw it earlier so I could […]
“Big Data” off 2015 Hype Cycle?
See this official 2015 hype cycle video here to get it straight from Gartner. In the video she says, first, it’s passed over the hump and is no longer just hype. Second, it’s embedded within other items throughout the cycle now. I can understand how this can get confusing to track and qualify, but isn’t […]
Partitioned Data & Why It Matters
As data volumes grow, so does your need to understand how to partition your data. Until you understand this distributed storage concept, you will be unable to choose the best approach for the job. This post gives an introductory explanation of partitioning and you will see why it is integral to the Hadoop Distributed File System (HDFS) increasingly […]
Web console for Kafka messaging system
Running Kafka for a streaming collection service can feel somewhat opaque at times, this is why I was thrilled to find the Kafka Web Console project on Github yesterday. This Scala application can be easily downloaded and installed with a couple steps. An included web server can then be launched to serve it up quickly. Here’s […]
Drinking from the (data) Firehose of Terror
Between classic business transactions and social interactions and machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is everlasting. Which is why you see a lot of things in the loop around real time frameworks and streaming frameworks. – Mike Hoskins, CTO Actian […]
HBase queries from Bash – a couple simple REST examples
Learn how to do some simple queries to extract data from the Hadoop/HDFS based HBase database using its REST API. Are you getting stuck trying to figure out HBase query via the REST API? Me too. The main HBase docs are pretty limited in terms of examples but I guess it’s all there, just not […]