You are browsing archives for

Category: Hadoop

VID: Solving Performance Problems on Hadoop

July 5, 2016Written by Tyler Mitchell

The full video of my talk from Hadoop Summit (San Jose, June 28, 2016) is now available. In this talk I cover performance considerations when moving analytic workloads into production. I even give away the game changing secret sauce for extreme performance in Actian’s Vector in Hadoop product for SQL analytics. VID: Solving Performance Problems […]

Actian, Analytics, Hadoop, SQL

Leave a Comment

January 20, 2016Written by Tyler Mitchell

Spark Analysis on a Large File GeoNames.org has free gazetteer data by country or for the world, provided in tab-separated text files. In this post I show you how to do some simple analysis using DataFrames in Spark. As the global file is 280M compressed and 1.2G uncompressed. This size of file makes it difficult to […]

Analytics, Geospatial, Hadoop, Programming, Spark, SQL

1 Comment

December 8, 2015Written by Tyler Mitchell

Table showing report from serverspec output

Serverspec is a Ruby-based system that can run Rspec formatted tests against a host. It can check for a long list of system details and status such as total memory, CPU count, running services, and more, including custom shell callouts. It compares the results of the checks with predefined thresholds and reports a pass or […]

Hadoop, Programming

Leave a Comment

October 15, 2015Written by Tyler Mitchell

Drowning while trying to understand your options for SQL-based database management in Hadoop? This graphic is a simplified comparison of the various features of several popular products being used today. I outline some of my biggest differentiators in this post. While this is a marketing slide for Actian’s SQL in Hadoop enterprise solution, I wish I saw it earlier so I could […]

Actian, Analytics, Hadoop

August 18, 2015Written by Tyler Mitchell

See this official 2015 hype cycle video here to get it straight from Gartner. In the video she says, first, it’s passed over the hump and is no longer just hype. Second, it’s embedded within other items throughout the cycle now. I can understand how this can get confusing to track and qualify, but isn’t […]

Actian, Analytics, Hadoop

Leave a Comment

May 5, 2015Written by Tyler Mitchell

This is my short video (14 min) showing how to build and launch the Apache Zeppelin notebook platform – a web UI for interactive query and analysis. This is all done running locally via OSX on a Macbook. In this video we focus on using the tutorial notebook that comes with Zeppelin and discuss each step – including interactive querying and charting – […]

Analytics, Hadoop, Programming, Spark, SQL, Visualization

8 Comments

April 28, 2015Written by Tyler Mitchell

As data volumes grow, so does your need to understand how to partition your data. Until you understand this distributed storage concept, you will be unable to choose the best approach for the job. This post gives an introductory explanation of partitioning and you will see why it is integral to the Hadoop Distributed File System (HDFS) increasingly […]

Analytics, Hadoop, Programming

Leave a Comment

March 18, 2015Written by Tyler Mitchell

Kafka Web Console - Zookeeper Register Form

Running Kafka for a streaming collection service can feel somewhat opaque at times, this is why I was thrilled to find the Kafka Web Console project on Github yesterday. This Scala application can be easily downloaded and installed with a couple steps. An included web server can then be launched to serve it up quickly. Here’s […]

Hadoop

11 Comments

March 14, 2015Written by Tyler Mitchell

Image of a boy about to blasted by a firehose

Between classic business transactions and social interactions and machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is everlasting. Which is why you see a lot of things in the loop around real time frameworks and streaming frameworks. – Mike Hoskins, CTO Actian […]

Analytics, Hadoop, Programming

February 20, 2015Written by Tyler Mitchell

Screenshot from Hortonworks site describing how Kafka works

[UPDATE: Check out the Kafka Web Console that allows you to manage topics and see traffic going through your topics – all in a browser!] When you’re pushing data into a Kafka topic, it’s always helpful to monitor the traffic using a simple Kafka consumer script. Here’s a simple script I’ve been using that […]

Analytics, Hadoop

4 Comments

February 5, 2015Written by Tyler Mitchell

I’ve been spending a lot of time this past year running queries against the open source SPARQLverse graph analytic engine. It’s amazing how simple some queries can look and yet how much work is being done behind the scenes. My current project requires building up a set of query examples that allow typical kinds of graph/network […]

Actian, Analytics, graph, Hadoop

3 Comments

January 7, 2015Written by Tyler Mitchell

[UPDATE: Check out the Kafka Web Console to more easily administer your Kafka topics] This week I’ve been working with the Kafka messaging system in a project. Basic C# Methods for Kafka Producer To publish to Kafka I built a C# app that uses the Kafka4n libraries – it doesn’t get much simpler than this: using Kafka.Client; Connector […]

Hadoop

Leave a Comment

September 26, 2014Written by Tyler Mitchell

(Okay, so you can be up and running quicker if you have a better internet connection than me.) Want to get your hands dirty with Hadoop related technologies but don’t have time to waste? I’ve spent way too much time trying to get HBase, for example, running on my Macbook with Brew and wish I had […]

Hadoop

Category: Hadoop

VID: Solving Performance Problems on Hadoop

Spark Analysis of Global Place Names (GeoNames)

Serverspec checks settings on a Hadoop cluster

Hadoop Options for SQL Databases

“Big Data” off 2015 Hype Cycle?

Zeppelin Notebook Tutorial Walkthrough

Partitioned Data & Why It Matters

Web console for Kafka messaging system

Drinking from the (data) Firehose of Terror

Kafka Consumer – Simple Python Script and Tips

SPARQL Query for Graph Density Analysis

Kafka Topic Clearing after Producing Messages

From zero to HDFS in 60 min.