Make Data Useful

Geography + Data

July 15, 2021Written by Tyler Mitchell

My post from Locate Press on helping define what geospatial refers to in the user and developer communities.

Geospatial

January 17, 2021Written by Tyler Mitchell

Analytics

May 1, 2020Written by Tyler Mitchell

A friend recently pointed me to the book They Will Be Giants by Robert Wilson and I immediately saw why. Most of my friends know that my ideal job is working in a business with any of them. Regardless of role or financial success, just working with them has always been inspirational. I’ve always stated […]

Analytics, Business

April 1, 2020Written by Tyler Mitchell

As mentioned Monday, I have a week of different webinars lined up to watch and learn. If you missed the first two, I’m sure you can find it on-demand in the original links. Here is the first one I watched – from TigerGraph and Expero – with their key talking points for context and my takeaways added. #1 – Energy […]

Analytics

Leave a Comment

March 30, 2020Written by Tyler Mitchell

4 Webinars I’m Watching This Week – GPU, 5G, graph analytics, cloud… Accelerating Data Science Workflows With Rapids How RAPIDS accelerates your Python data science toolchain with minimal code changes and no new tools to learn. How RAPIDS can accelerate model training and time to deployment. The wealth of accelerated apps available to maximize datacenter […]

Analytics

Leave a Comment

February 28, 2017Written by Tyler Mitchell

Photo of Tyler Mitchell speaking at Couchbase Connect 2016 event

With my recent move to working for Couchbase I’ve also made the leap from Big Data & SQL Analytics to NoSQL and document databases. I’m going to ease into blogging more about it here (and in my channel on the corporate blog) so for the first step let’s go through a few basic concepts. Then […]

Analytics

Leave a Comment

July 5, 2016Written by Tyler Mitchell

The full video of my talk from Hadoop Summit (San Jose, June 28, 2016) is now available. In this talk I cover performance considerations when moving analytic workloads into production. I even give away the game changing secret sauce for extreme performance in Actian’s Vector in Hadoop product for SQL analytics. VID: Solving Performance Problems […]

Actian, Analytics, Hadoop, SQL

Leave a Comment

June 7, 2016Written by Tyler Mitchell

Zeppelin has the option to change the storage options of its notebook system to allow you to use AWS S3. I’m not sure how long this has been around but I know it isn’t particularly new. However, I wanted to make a note of it as more users of cluster environments are spinning up resources to […]

Analytics, Cloud, Spark

Leave a Comment

April 11, 2016Written by Tyler Mitchell

Update the VM extension pack on OS X for improved performance and features. But when the auto update fails you likely have to manually remove current packs and then reinitiate the installation process or do it manually. This very short video shows you how.

Programming, Virtualization

Leave a Comment

April 4, 2016Written by Tyler Mitchell

Zeppeling tutorial example using Python instead of Scala for Spark SQL

This is a follow-up to my post from last year Apache Zeppelin on OSX – Ultra Quick Start but without building from source. Today I tested the latest version of Zeppelin (0.5.6) and, using their distributed binaries, was instantly able to launch Zeppelin and run both Scala and Python jobs on my Macbook. This was with zero configuration, […]

Analytics, Spark

Leave a Comment

January 20, 2016Written by Tyler Mitchell

Spark Analysis on a Large File GeoNames.org has free gazetteer data by country or for the world, provided in tab-separated text files. In this post I show you how to do some simple analysis using DataFrames in Spark. As the global file is 280M compressed and 1.2G uncompressed. This size of file makes it difficult to […]

Analytics, Geospatial, Hadoop, Programming, Spark, SQL

1 Comment

December 8, 2015Written by Tyler Mitchell

Table showing report from serverspec output

Serverspec is a Ruby-based system that can run Rspec formatted tests against a host. It can check for a long list of system details and status such as total memory, CPU count, running services, and more, including custom shell callouts. It compares the results of the checks with predefined thresholds and reports a pass or […]

Hadoop, Programming

Leave a Comment

October 15, 2015Written by Tyler Mitchell

Drowning while trying to understand your options for SQL-based database management in Hadoop? This graphic is a simplified comparison of the various features of several popular products being used today. I outline some of my biggest differentiators in this post. While this is a marketing slide for Actian’s SQL in Hadoop enterprise solution, I wish I saw it earlier so I could […]

Actian, Analytics, Hadoop

August 18, 2015Written by Tyler Mitchell

See this official 2015 hype cycle video here to get it straight from Gartner. In the video she says, first, it’s passed over the hump and is no longer just hype. Second, it’s embedded within other items throughout the cycle now. I can understand how this can get confusing to track and qualify, but isn’t […]

Actian, Analytics, Hadoop

Leave a Comment

May 29, 2015Written by Tyler Mitchell

Everyone that deals with geographic/spatial/geospatial data knows they need specialised tools for the job. Fortunately, there are a ton of open source tools up for the challenge. In my latest book, Geospatial Power Tools, I show how to use an advanced set of command line tools that you can start using today. From re-projecting point coordinates […]

Analytics

Leave a Comment

May 17, 2015Written by Tyler Mitchell

A few different errors have popped during my initiation into Apache Zeppelin, here are a few of them, summarised with workarounds if you need them. Tutorial Failure Due To Spark Versions Default Zeppelin comes with Spark 1.1 (though it may be updated by the time you read this). The current Zeppelin tutorial assumes Spark 1.3 or greater […]

Analytics, Programming, Spark, SQL

2 Comments

May 9, 2015Written by Tyler Mitchell

My latest notebook aims to mimic the original Scala-based Spark SQL tutorial with one that uses Python instead. Above you can see the two parallel translations side-by-side. Python Spark SQL Tutorial Code Here is the resulting Python data loading code. The SQL code is identical to the Tutorial notebook, so copy and paste if you need it. I […]

Analytics, Programming, Spark, SQL

11 Comments

May 5, 2015Written by Tyler Mitchell

This is my short video (14 min) showing how to build and launch the Apache Zeppelin notebook platform – a web UI for interactive query and analysis. This is all done running locally via OSX on a Macbook. In this video we focus on using the tutorial notebook that comes with Zeppelin and discuss each step – including interactive querying and charting – […]

Analytics, Hadoop, Programming, Spark, SQL, Visualization

8 Comments

May 1, 2015Written by Tyler Mitchell

The Zeppelin project provides a powerful web-based notebook platform for data analysis and discovery. Behind the scenes it supports Spark distributed contexts as well as other language bindings on top of Spark. This post is a very simple introduction to show the first few steps to get started. You’ll find all you need to know […]

Analytics, Programming, Spark, Visualization

2 Comments

April 28, 2015Written by Tyler Mitchell

As data volumes grow, so does your need to understand how to partition your data. Until you understand this distributed storage concept, you will be unable to choose the best approach for the job. This post gives an introductory explanation of partitioning and you will see why it is integral to the Hadoop Distributed File System (HDFS) increasingly […]

Analytics, Hadoop, Programming

Leave a Comment

Geography + Data

DIY Battery – Weekend Project – Aluminum + Bl

It’s all about the ecosystem – build and nurt

Learnings from TigerGraph and Expero webinar

4 Webinars This Week – GPU, 5G, graph analyti

Geography + Data

DIY Battery – Weekend Project – Aluminum + Bleach?

It’s all about the ecosystem – build and nurture yours

Learnings from TigerGraph and Expero webinar

4 Webinars This Week – GPU, 5G, graph analytics, cloud

Diving into #NoSQL from the SQL Empire …

VID: Solving Performance Problems on Hadoop

Storing Zeppelin Notebooks in AWS S3 Buckets

VirtualBox extension pack update on OS X

Zeppelin Notebook Quick Start on OSX v0.5.6

Spark Analysis of Global Place Names (GeoNames)

Serverspec checks settings on a Hadoop cluster

Hadoop Options for SQL Databases

“Big Data” off 2015 Hype Cycle?

Spatial Data Made Useful

Common Zeppelin Errors

Python Spark SQL – Zeppelin Tutorial – No Scala

Zeppelin Notebook Tutorial Walkthrough

Apache Zeppelin on OSX – Ultra Quick Start

Partitioned Data & Why It Matters