Analytics Zoo source code is available at GitHub:
git clone https://github.com/intel-analytics/analytics-zoo.git
git clone will download the development version of Analytics Zoo. If you want a release version, you can use the command
git checkout to change the specified version.
To generate a new whl package for pip install, you can run the following script:
bash analytics-zoo/pyzoo/dev/build.sh linux default false
The first argument is the platform to build for. Either ‘linux’ or ‘mac’.
The second argument is the analytics-zoo version to build for. ‘default’ means the default version for the current branch. You can also specify a different version if you wish, e.g., ‘0.6.0.dev1’.
You can also add other profiles to build the package, especially Spark and BigDL versions. For example, under the situation that
pyspark==2.4.3is a dependency, you need to add profiles
-Dspark.version=2.4.3 -Dbigdl.artifactId=bigdl-SPARK_2.4 -P spark_2.4+to build Analytics Zoo for Spark 2.4.3.
After running the above command, you will find a
whl file under the folder
analytics-zoo/pyzoo/dist/. You can then directly pip install it to your local Python environment:
pip install analytics-zoo/pyzoo/dist/analytics_zoo-VERSION-py2.py3-none-PLATFORM_x86_64.whl
See here for more instructions to run analytics-zoo after pip install.
1.2 IDE Setup¶
Any IDE that support python should be able to run Analytics Zoo. PyCharm works fine for us.
You need to do the following preparations before starting the IDE to successfully run an Analytics Zoo Python program in the IDE:
Build Analytics Zoo; see here for more instructions.
Prepare Spark environment by either setting
SPARK_HOMEas the environment variable or pip install
pyspark. Note that the Spark version should match the one you build Analytics Zoo on.
Prepare BigDL Python environment by either downloading BigDL source code from GitHub or pip install
bigdl. Note that the BigDL version should match the one you build Analytics Zoo on.
If you download BigDL from GitHub, you also need to add
The above environmental variables should be available when running or debugging code in IDE.
In PyCharm, go to RUN -> Edit Configurations. In the “Run/Debug Configurations” panel, you can update the above environment variables in your configuration.
Maven 3 is needed to build Analytics Zoo, you can download it from the maven website.
After installing Maven 3, please set the environment variable MAVEN_OPTS as follows:
$ export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
It is highly recommended that you build Analytics Zoo using the make-dist.sh script with Java 8.
You can build Analytics Zoo with the following commands:
$ bash make-dist.sh
After that, you can find a
dist folder, which contains all the needed files to run a Analytics Zoo program. The files in
dist/lib/analytics-zoo-VERSION-jar-with-dependencies.jar: This jar package contains all dependencies except Spark classes.
dist/lib/analytics-zoo-VERSION-python-api.zip: This zip package contains all Python files of Analytics Zoo.
The instructions above will build Analytics Zoo with Spark 2.4.3. To build with other spark versions, for example building analytics-zoo with spark 2.2.0, you can use
bash make-dist.sh -Dspark.version=2.2.0 -Dbigdl.artifactId=bigdl_SPARK_2.2.
Build with JDK 11
Spark starts to supports JDK 11 and Scala 2.12 at Spark 3.0. You can use
-P spark_3.x to specify Spark3 and scala 2.12. Additionally,
make-dist.sh default uses Java 8. To compile with Java 11, it is required to specify building opts
-Djava.version=11 -Djavac.version=11. You can build with
It’s recommended to download Oracle JDK 11. This will avoid possible incompatibilities with maven plugins. You should update
PATH and make sure your
JAVA_HOME environment variable is set to Java 11 if you’re running from the command line. If you’re running from an IDE, you need to make sure it is set to run maven with your current JDK.
$ bash make-dist.sh -P spark_3.x -Djava.version=11 -Djavac.version=11
2.2 IDE Setup¶
Analytics Zoo uses maven to organize project. You should choose an IDE that supports Maven project and scala language. IntelliJ IDEA works fine for us.
In IntelliJ, you can open Analytics Zoo project root directly, and the IDE will import the project automatically.
We set the scopes of spark related libraries to
provided in the maven pom.xml, which, however, will cause a problem in IDE (throwing
NoClassDefFoundError when you run applications). You can easily change the scopes using the
In Intellij, go to View -> Tools Windows -> Maven Projects. Then in the Maven Projects panel, Profiles -> click “all-in-one”.