You are browsing archives for

Category: Spark

Storing Zeppelin Notebooks in AWS S3 Buckets

June 7, 2016Written by Tyler Mitchell

Zeppelin has the option to change the storage options of its notebook system to allow you to use AWS S3. I’m not sure how long this has been around but I know it isn’t particularly new. However, I wanted to make a note of it as more users of cluster environments are spinning up resources to […]

Analytics, Cloud, Spark

Leave a Comment

January 20, 2016Written by Tyler Mitchell

Spark Analysis on a Large File GeoNames.org has free gazetteer data by country or for the world, provided in tab-separated text files. In this post I show you how to do some simple analysis using DataFrames in Spark. As the global file is 280M compressed and 1.2G uncompressed. This size of file makes it difficult to […]

Analytics, Geospatial, Hadoop, Programming, Spark, SQL

1 Comment

May 17, 2015Written by Tyler Mitchell

A few different errors have popped during my initiation into Apache Zeppelin, here are a few of them, summarised with workarounds if you need them. Tutorial Failure Due To Spark Versions Default Zeppelin comes with Spark 1.1 (though it may be updated by the time you read this). The current Zeppelin tutorial assumes Spark 1.3 or greater […]

Analytics, Programming, Spark, SQL

2 Comments

May 9, 2015Written by Tyler Mitchell

Zeppeling tutorial example using Python instead of Scala for Spark SQL

My latest notebook aims to mimic the original Scala-based Spark SQL tutorial with one that uses Python instead. Above you can see the two parallel translations side-by-side. Python Spark SQL Tutorial Code Here is the resulting Python data loading code. The SQL code is identical to the Tutorial notebook, so copy and paste if you need it. I […]

Analytics, Programming, Spark, SQL

11 Comments

May 5, 2015Written by Tyler Mitchell

This is my short video (14 min) showing how to build and launch the Apache Zeppelin notebook platform – a web UI for interactive query and analysis. This is all done running locally via OSX on a Macbook. In this video we focus on using the tutorial notebook that comes with Zeppelin and discuss each step – including interactive querying and charting – […]

Analytics, Hadoop, Programming, Spark, SQL, Visualization

8 Comments

May 1, 2015Written by Tyler Mitchell

The Zeppelin project provides a powerful web-based notebook platform for data analysis and discovery. Behind the scenes it supports Spark distributed contexts as well as other language bindings on top of Spark. This post is a very simple introduction to show the first few steps to get started. You’ll find all you need to know […]

Analytics, Programming, Spark, Visualization

2 Comments

Category: Spark

Storing Zeppelin Notebooks in AWS S3 Buckets

Zeppelin Notebook Quick Start on OSX v0.5.6

Spark Analysis of Global Place Names (GeoNames)

Common Zeppelin Errors

Python Spark SQL – Zeppelin Tutorial – No Scala

Zeppelin Notebook Tutorial Walkthrough

Apache Zeppelin on OSX – Ultra Quick Start