Spark Analysis on a Large File GeoNames.org has free gazetteer data by country or for the world, provided in tab-separated text files.  In this post I show you how to do some simple analysis using DataFrames in Spark.  As the global file is 280M compressed and 1.2G uncompressed.  This size of file makes it difficult to […]