You are browsing archives for

Tag: analytics

VID: Solving Performance Problems on Hadoop

July 5, 2016Written by Tyler Mitchell

The full video of my talk from Hadoop Summit (San Jose, June 28, 2016) is now available. In this talk I cover performance considerations when moving analytic workloads into production. I even give away the game changing secret sauce for extreme performance in Actian’s Vector in Hadoop product for SQL analytics. VID: Solving Performance Problems […]

Actian, Analytics, Hadoop, SQL

Leave a Comment

June 7, 2016Written by Tyler Mitchell

Zeppelin has the option to change the storage options of its notebook system to allow you to use AWS S3. I’m not sure how long this has been around but I know it isn’t particularly new. However, I wanted to make a note of it as more users of cluster environments are spinning up resources to […]

Analytics, Cloud, Spark

Leave a Comment

April 4, 2016Written by Tyler Mitchell

Zeppeling tutorial example using Python instead of Scala for Spark SQL

This is a follow-up to my post from last year Apache Zeppelin on OSX – Ultra Quick Start but without building from source. Today I tested the latest version of Zeppelin (0.5.6) and, using their distributed binaries, was instantly able to launch Zeppelin and run both Scala and Python jobs on my Macbook. This was with zero configuration, […]

Analytics, Spark

Leave a Comment

August 18, 2015Written by Tyler Mitchell

See this official 2015 hype cycle video here to get it straight from Gartner. In the video she says, first, it’s passed over the hump and is no longer just hype. Second, it’s embedded within other items throughout the cycle now. I can understand how this can get confusing to track and qualify, but isn’t […]

Actian, Analytics, Hadoop

Leave a Comment

May 17, 2015Written by Tyler Mitchell

A few different errors have popped during my initiation into Apache Zeppelin, here are a few of them, summarised with workarounds if you need them. Tutorial Failure Due To Spark Versions Default Zeppelin comes with Spark 1.1 (though it may be updated by the time you read this). The current Zeppelin tutorial assumes Spark 1.3 or greater […]

Analytics, Programming, Spark, SQL

2 Comments

May 9, 2015Written by Tyler Mitchell

My latest notebook aims to mimic the original Scala-based Spark SQL tutorial with one that uses Python instead. Above you can see the two parallel translations side-by-side. Python Spark SQL Tutorial Code Here is the resulting Python data loading code. The SQL code is identical to the Tutorial notebook, so copy and paste if you need it. I […]

Analytics, Programming, Spark, SQL

11 Comments

May 5, 2015Written by Tyler Mitchell

This is my short video (14 min) showing how to build and launch the Apache Zeppelin notebook platform – a web UI for interactive query and analysis. This is all done running locally via OSX on a Macbook. In this video we focus on using the tutorial notebook that comes with Zeppelin and discuss each step – including interactive querying and charting – […]

Analytics, Hadoop, Programming, Spark, SQL, Visualization

8 Comments

May 1, 2015Written by Tyler Mitchell

The Zeppelin project provides a powerful web-based notebook platform for data analysis and discovery. Behind the scenes it supports Spark distributed contexts as well as other language bindings on top of Spark. This post is a very simple introduction to show the first few steps to get started. You’ll find all you need to know […]

Analytics, Programming, Spark, Visualization

2 Comments

April 6, 2015Written by Tyler Mitchell

IoT Energy Series Header Image showing home energy gateway and monitor services

Recently I started an Internet of Things series on my experiences installing, using and analysing data from a smart electrical meter. This included a BC Hydro smart meter, Eagle monitoring gateway from rainforest automation, and a cloud-based analytics service from Bidgely. I’ve collated all the posts on the topic for you below. More will be […]

Analytics, Energy, Internet of Things - IoT

3 Comments

April 5, 2015Written by Tyler Mitchell

Bidgely Energy Monitor - Appliance Breakdown

After a week of collecting smart meter readings, I’m now ready to show results in a cloud-based energy monitor system – Bidgely – complete with graphs showing readings, cost and machine learning results breaking down my usage by appliance. This is part 4 of a series of posts about the Internet of Things applied to Home Energy […]

Analytics, Energy, Internet of Things - IoT, Visualization

Leave a Comment

March 28, 2015Written by Tyler Mitchell

Energy monitoring isn’t only about knowing what’s happening right now, but also understanding what happened historically. Often that includes knowing not just what was happening but also when and why. Enter cloud services for energy monitoring. They range from simple charts and graphs to predicting your usage over time – essentially storing, analysing and enriching your […]

Analytics, Internet of Things - IoT, Visualization

1 Comment

March 14, 2015Written by Tyler Mitchell

Image of a boy about to blasted by a firehose

Between classic business transactions and social interactions and machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is everlasting. Which is why you see a lot of things in the loop around real time frameworks and streaming frameworks. – Mike Hoskins, CTO Actian […]

Analytics, Hadoop, Programming

February 19, 2015Written by Tyler Mitchell

Graph analysis is all about finding relationships. In this post I show how to compute graph density (a ratio of how well connected relationships in a graph are) using a Cypher query with Neo4j. This is a follow up to the earlier post: SPARQL Query for Graph Density Analysis. Installing Neo4j Graph Database In this example we launch Neo4j and […]

Analytics, graph, SPARQL

Leave a Comment

February 18, 2015Written by Tyler Mitchell

In preparation for a post about doing graph analytics in Neo4j (paralleling SPARQLverse from this earlier post), I had to learn to load text/CSV data into Neo. This post just shows the steps I took to load nodes and then establish edges/relationships in the database. My head hurt trying to find a simple example of […]

Analytics, graph, SPARQL

2 Comments

February 5, 2015Written by Tyler Mitchell

I’ve been spending a lot of time this past year running queries against the open source SPARQLverse graph analytic engine. It’s amazing how simple some queries can look and yet how much work is being done behind the scenes. My current project requires building up a set of query examples that allow typical kinds of graph/network […]

Actian, Analytics, graph, Hadoop

3 Comments

December 12, 2014Written by Tyler Mitchell

It’s not every day that you receive snail mail with life-changing information in it, but when it does come, it can come from the unlikeliest sources. A year ago, when doing a simple change of health insurance vendors, I had to give the requisite blood sample. I knew the drill… nurse comes to the house, takes blood, […]

Analytics, Healthcare

Leave a Comment

December 3, 2014Written by Tyler Mitchell

After you’ve loaded log files into elasticsearch you can start to visualize them using the Kibana web app and build your own dashboard. While using Kibana for a week or so, I found it tricky to find the docs or tutorials to get me up to speed quickly with some of the more advanced/hidden features. In […]

Analytics

Leave a Comment

September 26, 2014Written by Tyler Mitchell

(Okay, so you can be up and running quicker if you have a better internet connection than me.) Want to get your hands dirty with Hadoop related technologies but don’t have time to waste? I’ve spent way too much time trying to get HBase, for example, running on my Macbook with Brew and wish I had […]

Hadoop

May 17, 2014Written by Tyler Mitchell

I spent the last 6 months undergoing some dramatic health changes (ping me for details), primarily diet, and now I’m getting around to refactoring my fitness. Naturally, I want to try some apps that both collect lots of sensor data but that also present it back to me in a meaningful (and hopefully motivating) way. While I’m […]

Uncategorized

Tag: analytics

VID: Solving Performance Problems on Hadoop

Storing Zeppelin Notebooks in AWS S3 Buckets

Zeppelin Notebook Quick Start on OSX v0.5.6

“Big Data” off 2015 Hype Cycle?

Common Zeppelin Errors

Python Spark SQL – Zeppelin Tutorial – No Scala

Zeppelin Notebook Tutorial Walkthrough

Apache Zeppelin on OSX – Ultra Quick Start

Home Energy Monitor Series – Internet of Things

IoT Day 4: Bidgely Cloud Energy Monitor Dashboard

Drinking from the (data) Firehose of Terror

Neo4j Cypher Query for Graph Density Analysis

Graph relations in Neo4j – simple load example

SPARQL Query for Graph Density Analysis

Data Sharing Saved My Life – or How an Insurer Reduced My Healthcare Claim Costs

Analytics Dashboard – Kibana 3 – a few short quick tips

From zero to HDFS in 60 min.

Leveraging Analytics for Personal Health