Serverspec is a Ruby-based system that can run Rspec formatted tests against a host.  It can check for a long list of system details and status such as total memory, CPU count, running services, and more, including custom shell callouts.  It compares the results of the checks with predefined thresholds and reports a pass or fail condition for each test.

A single test can be as simple as:

describe file('/etc/hadoop/conf/hdfs-site.xml') do
   it {should be_file}
end

Or more complex using Ruby plugins, such as Nokogiri for XML parsing in this example:

Most Serverspec-based systems produce a report for a given host and a given set of tests.  I needed to query a bunch of hosts, and entire cluster actually.  There aren’t as many people doing that as I had thought.  The systems usually then dump out a report listing all the errors or produce a set of JSON files with the details for success and failure.

Taking up a notch, Vincent Bernat’s Github project called serverspec-example does almost everything I need, including formatting the JSON into a web accessible report. Table showing report from serverspec output I can then mouse over or click on the status and get more details.  It’s a great example of using serverspec.

I made a few tweaks – querying YARN for a list of nodes instead of predefining nodes in a text file.  I also wrote a custom set of checks specifically for HDFS and the hardware.  I set some thresholds that are part of a larger effort to make sure system requirements are met for apps I am going to run on the cluster (Actian Vortex SQL Analytics on Hadoop).

Vincent was grouping hosts into certain roles which run different tests and output reports on different tabs (the multi-tab “Disk” example in the screenshot).  I don’t need that yet as I am not pre-defining roles but checking across the whole cluster and treating everything as a data node.  But I could see doing that at some point when I break out leader nodes and data nodes which will have different requirements for me.

I’m still working on finalising my version of these changes but wanted to share my progress so far and encourage you to try this very productive way of accessing your cluster.

About Tyler Mitchell

Director Product Marketing @ OmniSci.com GPU-accelerate data analytics | Sr. Product Manager @ Couchbase.com - next generation Data Platform for System of Engagement! Former Eng. Director @Actian.com, author and technology writer in NoSQL, big data, graph analytics, geospatial and Internet of Things. Follow me @1tylermitchell or get my book from http://locatepress.com/.