Name: findspark
Owner: Bombora
Description: null
Created: 2016-07-01 18:44:10.0
Updated: 2016-07-01 18:44:11.0
Pushed: 2016-05-15 19:41:31.0
Homepage: null
Size: 15
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
PySpark isn't on sys.path by default, but that doesn't mean it can't be used as a regular library.
You can address this by either symlinking pyspark into your site-packages,
or adding pyspark to sys.path at runtime. findspark
does the latter.
To initialize PySpark, just call
rt findspark
spark.init()
rt pyspark
pyspark.SparkContext(appName="myAppName")
Without any arguments, the SPARK_HOME environmental variable will be used, and if that isn't set, other possible install locations will be checked. If you've installed spark with
brew install apache-spark
on OS X, the location /usr/local/opt/apache-spark/libexec
will be searched.
Alternatively, you can specify a location with the spark_home
argument.
spark.init('/path/to/spark_home')
To verify the automatically detected location, call
spark.find()
Findspark can add a startup file to the current IPython profile so that the enviornment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when edit_profile
is set to true.
hon --profile=myprofile
spark.init('/path/to/spark_home', edit_profile=True)
Findspark can also add to the .bashrc configuration file if it is present so that the enviornment variables will be properly set whenever a new shell is opened. This is enabled by setting the optional argument edit_rc
to true.
spark.init('/path/to/spark_home', edit_rc=True)
If changes are persisted, findspark will not need to be called again unless the spark installation is moved.