PySpark Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it...

To install from the binaries, we only need four steps (see the following source code) as we do not need to compile the sources:

  1. Download the precompiled binaries from Spark's website.
  2. Unpack the archive.
  3. Move to the final destination.
  4. Create the necessary environmental variables.

The skeleton for our code looks as follows (see the Chapter01/installFromBinary.sh file):

#!/bin/bash
# Shell script for installing Spark from binaries

#
# PySpark Cookbook
# Author: Tomasz Drabas, Denny Lee
# Version: 0.1
# Date: 12/2/2017
_spark_binary="http://mirrors.ocf.berkeley.edu/apache/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz"
_spark_archive=$( echo "$_spark_binary" | awk -F '/' '{print $NF}' )
_spark_dir=$( echo "${_spark_archive%.*}" )
_spark_destination="/opt/spark"
...
checkOS
printHeader
downloadThePackage
unpack
moveTheBinaries
setSparkEnvironmentVariables
cleanUp