To connect to S3 from Spark you need two environment variables for security credentials and they are:

AWS_ACCESS_KEY
AWS_SECRET_ACCESS_KEY

There are three ways to set them up.

1. In .bashrc file, add following two lines at the end. Replace these dummy values with the real values from your AWS account.

export AWS_SECRET_ACCESS_KEY=ed+11LI1zsT62cPFRUmjXswWL7lEa9a5Ncm26VfC
export AWS_ACCESS_KEY_ID=AKIAJOEX7YHFQ5OYSLIQ

After updating .bashrc source it to refresh it.

$source .bashrc

2. you can set them up on command line.

$ export AWS_SECRET_ACCESS_KEY=ed+11LI1zsT62cPFRUmjXswWL7lEa9a5Tcm25VfC
$ export AWS_ACCESS_KEY_ID=AKIAJOEX7YHFQ5OYSLIQ

3. You can set them up in Spark shell

scala> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "")
scala> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","")

Contributed by Spark Training Class of Feb, 2016

Top