Developer Guide¶
Analytics Zoo source code is available at GitHub:
git clone https://github.com/intel-analytics/analytics-zoo.git
By default, git clone
will download the development version of Analytics Zoo. If you want a release version, you can use the command git checkout
to change the specified version.
1. Python¶
1.1 Build¶
To generate a new whl package for pip install, you can run the following script:
bash analytics-zoo/pyzoo/dev/build.sh linux default false
Arguments:
The first argument is the platform to build for. Either ‘linux’ or ‘mac’.
The second argument is the analytics-zoo version to build for. ‘default’ means the default version for the current branch. You can also specify a different version if you wish, e.g., ‘0.6.0.dev1’.
You can also add other profiles to build the package, especially Spark and BigDL versions. For example, under the situation that
pyspark==2.4.3
is a dependency, you need to add profiles-Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.4+
to build Analytics Zoo for Spark 2.4.3.
After running the above command, you will find a whl
file under the folder analytics-zoo/pyzoo/dist/
. You can then directly pip install it to your local Python environment:
pip install analytics-zoo/pyzoo/dist/analytics_zoo-VERSION-py2.py3-none-PLATFORM_x86_64.whl
See here for more instructions to run analytics-zoo after pip install.
1.2 IDE Setup¶
Any IDE that support python should be able to run Analytics Zoo. PyCharm works fine for us.
You need to do the following preparations before starting the IDE to successfully run an Analytics Zoo Python program in the IDE:
Build Analytics Zoo; see here for more instructions.
Prepare Spark environment by either setting
SPARK_HOME
as the environment variable or pip installpyspark
. Note that the Spark version should match the one you build Analytics Zoo on.Set BIGDL_CLASSPATH:
export BIGDL_CLASSPATH=analytics-zoo/dist/lib/analytics-zoo-*-jar-with-dependencies.jar
Prepare BigDL Python environment by either downloading BigDL source code from GitHub or pip install
bigdl
. Note that the BigDL version should match the one you build Analytics Zoo on.Add
pyzoo
andspark-analytics-zoo.conf
toPYTHONPATH
:
export PYTHONPATH=analytics-zoo/pyzoo:analytics-zoo/dist/conf/spark-analytics-zoo.conf:$PYTHONPATH
If you download BigDL from GitHub, you also need to add BigDL/pyspark
to PYTHONPATH
:
export PYTHONPATH=BigDL/pyspark:$PYTHONPATH
The above environmental variables should be available when running or debugging code in IDE.
In PyCharm, go to RUN -> Edit Configurations. In the “Run/Debug Configurations” panel, you can update the above environment variables in your configuration.
2. Scala¶
2.1 Build¶
Maven 3 is needed to build Analytics Zoo, you can download it from the maven website.
After installing Maven 3, please set the environment variable MAVEN_OPTS as follows:
$ export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
Build using make-dist.sh
It is highly recommended that you build Analytics Zoo using the make-dist.sh script with Java 8.
You can build Analytics Zoo with the following commands:
$ bash make-dist.sh
After that, you can find a dist
folder, which contains all the needed files to run a Analytics Zoo program. The files in dist
include:
dist/lib/analytics-zoo-VERSION-jar-with-dependencies.jar: This jar package contains all dependencies except Spark classes.
dist/lib/analytics-zoo-VERSION-python-api.zip: This zip package contains all Python files of Analytics Zoo.
The instructions above will build Analytics Zoo with Spark 2.4.3. To build with other spark versions, for example building analytics-zoo with spark 2.2.0, you can use bash make-dist.sh -Dspark.version=2.2.0 -Dbigdl.artifactId=bigdl_SPARK_2.2
.
Build with JDK 11
Spark starts to supports JDK 11 and Scala 2.12 at Spark 3.0. You can use -P spark_3.x
to specify Spark3 and scala 2.12. Additionally, make-dist.sh
default uses Java 8. To compile with Java 11, it is required to specify building opts -Djava.version=11 -Djavac.version=11
. You can build with make-dist.sh
.
It’s recommended to download Oracle JDK 11. This will avoid possible incompatibilities with maven plugins. You should update PATH
and make sure your JAVA_HOME
environment variable is set to Java 11 if you’re running from the command line. If you’re running from an IDE, you need to make sure it is set to run maven with your current JDK.
Build with make-dist.sh
:
$ bash make-dist.sh -P spark_3.x -Djava.version=11 -Djavac.version=11
2.2 IDE Setup¶
Analytics Zoo uses maven to organize project. You should choose an IDE that supports Maven project and scala language. IntelliJ IDEA works fine for us.
In IntelliJ, you can open Analytics Zoo project root directly, and the IDE will import the project automatically.
We set the scopes of spark related libraries to provided
in the maven pom.xml, which, however, will cause a problem in IDE (throwing NoClassDefFoundError
when you run applications). You can easily change the scopes using the all-in-one
profile.
In Intellij, go to View -> Tools Windows -> Maven Projects. Then in the Maven Projects panel, Profiles -> click “all-in-one”.