Python User Guide¶
We recommend using conda to prepare the Python environment as follows:
conda create -n zoo python=3.7 # "zoo" is conda environment name, you can use any name you like. conda activate zoo
You need to install JDK in the environment, and properly set the environment variable
JAVA_HOME. JDK8 is highly recommended.
You may take the following commands as a reference for installing OpenJDK:
# For Ubuntu sudo apt-get install openjdk-8-jre export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ # For CentOS su -c "yum install java-1.8.0-openjdk" export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-184.108.40.2062.b08-1.el7_9.x86_64/jre export PATH=$PATH:$JAVA_HOME/bin java -version # Verify the version of JDK.
1.1 Official Release¶
You can install the latest release version of Analytics Zoo as follows:
pip install analytics-zoo
Note: Installing Analytics Zoo will automatically install
conda-pack==0.3.1 and their dependencies if they haven’t been detected in your conda environment.
1.2 Nightly Build¶
You can install the latest nightly build of Analytics Zoo as follows:
pip install --pre --upgrade analytics-zoo
Alternatively, you can find the list of the nightly build versions here, and install a specific version as follows:
pip install analytics-zoo=version
Note: If you are using a custom URL of Python Package Index, you may need to check whether the latest packages have been sync’ed with pypi.
Or you can add the option
-i https://pypi.python.org/simple when pip install to use pypi as the index-url.
Note: Installing Analytics Zoo from pip will automatically install
pyspark. To avoid possible conflicts, you are highly recommended to unset the environment variable
SPARK_HOME if it exists in your environment.
2.1 Interactive Shell¶
You may test if the installation is successful using the interactive Python shell as follows:
pythonin the command line to start a REPL.
Try to run the example code below to verify the installation:
import zoo from zoo.orca import init_orca_context print(zoo.__version__) # Verify the version of analytics-zoo. sc = init_orca_context() # Initiation of analytics-zoo on the underlying cluster.
2.2 Jupyter Notebook¶
You can start the Jupyter notebook as you normally do using the following command and run Analytics Zoo programs directly in a Jupyter notebook:
jupyter notebook --notebook-dir=./ --ip=* --no-browser
2.3 Python Script¶
You can directly write Analytics Zoo programs in a Python file (e.g. script.py) and run in the command line as a normal Python program:
3. Python Dependencies¶
We recommend using conda to manage your Python dependencies. Libraries installed in the current conda environment will be automatically distributed to the cluster when calling
init_orca_context. You can also add extra dependencies as
.egg files by specifying
extra_python_lib argument in
For more details, please refer to Orca Context.
Analytics Zoo has been tested on Python 3.6 and 3.7 with the following library versions:
pyspark==2.4.6 ray==1.2.0 tensorflow==1.15.0 or >2.0 pytorch>=1.5.0 torchvision>=0.6.0 horovod==0.19.2 mxnet>=1.6.0 bayesian-optimization==1.1.0 dask==2.14.0 h5py==2.10.0 numpy==1.18.1 opencv-python==220.127.116.11 pandas==1.0.3 Pillow==7.1.1 protobuf==3.12.0 psutil==5.7.0 py4j==0.10.7 redis==3.4.1 scikit-learn==0.22.2.post1 scipy==1.4.1 tensorboard==1.15.0 tensorboardX>=2.1 tensorflow-datasets==3.2.0 tensorflow-estimator==1.15.1 tensorflow-gan==2.0.0 tensorflow-hub==0.8.0 tensorflow-metadata==0.21.1 tensorflow-probability==0.7.0 Theano==1.0.4