Apache Spark installation comes bundled with spark-ec2 script which makes it easy to create Spark instances on EC2. This recipe will cover connecting to EC2 using this script.

  1. Login to your Amazon AWS account
  2. Click on Security Credentials under your account name in the top-right corner.
  3. Click on Access Keys and Create New Access Key
  4. Note down the access key ID and secret access key
  5. Now go to Services | EC2
  6. Click on Key Pairs in left-hand menu under NETWORK & SECURITY
  7. Click on Create Key Pair and enter kp-spark as key-pair name
  8. Download the private key file and copy it in the /home/hduser/keypairs folder.
  9. Set permissions on key file to 600.
  10. Set environment variables to reflect access key ID and secret access key
    (please replace sample values with your own values):

    $ echo "export AWS_ACCESS_KEY_ID=\"AKIAOD7M2LOWATFXFKQ\"" >> /
    $ echo "export AWS_SECRET_ACCESS_KEY=\"+Xr4UroVYJxiLiY8DLT4DLT4D4s
    xc3ijZGMx1D3pfZ2q\"" >> /home/hduser/.bashrc
    $ echo "export PATH=$PATH:/opt/infoobjects/spark/ec2" >> /home/
  11. Launch the cluster with the example value:
    $ spark-ec2 -k kp-spark -i /home/hduser/keypairs/kp-spark.pem
    --hadoop-major-version 2 -s 3 launch spark-cluster

For more details about this recipe, please read Spark Cookbook by Packt Publishing.

If you need help with any of your Spark implementations, please contact us.