Update the VM extension pack on OS X for improved performance and features. But when the auto update fails you likely have to manually remove current packs and then reinitiate the installation process or do it manually. This very short video shows you how.
You are browsing archives for
Category: Programming
Spark Analysis of Global Place Names (GeoNames)
Spark Analysis on a Large File GeoNames.org has free gazetteer data by country or for the world, provided in tab-separated text files. In this post I show you how to do some simple analysis using DataFrames in Spark. As the global file is 280M compressed and 1.2G uncompressed. This size of file makes it difficult to […]
Serverspec checks settings on a Hadoop cluster
Serverspec is a Ruby-based system that can run Rspec formatted tests against a host. It can check for a long list of system details and status such as total memory, CPU count, running services, and more, including custom shell callouts. It compares the results of the checks with predefined thresholds and reports a pass or […]
Common Zeppelin Errors
A few different errors have popped during my initiation into Apache Zeppelin, here are a few of them, summarised with workarounds if you need them. Tutorial Failure Due To Spark Versions Default Zeppelin comes with Spark 1.1 (though it may be updated by the time you read this). The current Zeppelin tutorial assumes Spark 1.3 or greater […]
Python Spark SQL – Zeppelin Tutorial – No Scala
My latest notebook aims to mimic the original Scala-based Spark SQL tutorial with one that uses Python instead. Above you can see the two parallel translations side-by-side. Python Spark SQL Tutorial Code Here is the resulting Python data loading code. The SQL code is identical to the Tutorial notebook, so copy and paste if you need it. I […]
Zeppelin Notebook Tutorial Walkthrough
This is my short video (14 min) showing how to build and launch the Apache Zeppelin notebook platform – a web UI for interactive query and analysis. This is all done running locally via OSX on a Macbook. In this video we focus on using the tutorial notebook that comes with Zeppelin and discuss each step – including interactive querying and charting – […]
Apache Zeppelin on OSX – Ultra Quick Start
The Zeppelin project provides a powerful web-based notebook platform for data analysis and discovery. Behind the scenes it supports Spark distributed contexts as well as other language bindings on top of Spark. This post is a very simple introduction to show the first few steps to get started. You’ll find all you need to know […]
Partitioned Data & Why It Matters
As data volumes grow, so does your need to understand how to partition your data. Until you understand this distributed storage concept, you will be unable to choose the best approach for the job. This post gives an introductory explanation of partitioning and you will see why it is integral to the Hadoop Distributed File System (HDFS) increasingly […]
Review of 3 Recent Internet of Thing (IoT) Announcements
Working in the big data and analytics space, I’m always interested in parts of the Internet of Things (IoT) that will produce more data, require more backend systems, and help users/customers get on with their day better. The past week has shown a few interesting announcements relating to Internet of Things topics. Here are just […]
Drinking from the (data) Firehose of Terror
Between classic business transactions and social interactions and machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is everlasting. Which is why you see a lot of things in the loop around real time frameworks and streaming frameworks. – Mike Hoskins, CTO Actian […]
HBase queries from Bash – a couple simple REST examples
Learn how to do some simple queries to extract data from the Hadoop/HDFS based HBase database using its REST API. Are you getting stuck trying to figure out HBase query via the REST API? Me too. The main HBase docs are pretty limited in terms of examples but I guess it’s all there, just not […]