Background
As a recent client requirement I needed to propose a solution in order to add spark2 as interpreter to zeppelin in HDP (Hortonworks Data Platform) 2.5.3
The first hurdle is, HDP 2.5.3 comes with zeppelin 0.6.0 which does not support spark2, which was included as a technical preview. Upgrade the HDP version was not an option due to the effort and platform availability. At the end I found in the HCC (Hortonworks Community Connection) a solution, which involves installing a standalone zeppelin which does not affect the Ambari managed zeppelin delivered with HDP 2.5.3.
I want to share how I did it with you.
Preliminary steps
Stop current Zeppelin: version 0.6.0 comes with HDP 2.5.3
su zeppelin
/usr/hdp/current/zeppelin-server/bin/zeppelin-daemon.sh stop
Deactivate script that starts this version by a system reboot
Zeppelin is started as an Ambari dependency in the script
usr/lib/hue/tools/start_deps.mf
In order to avoid a modification in this file a custom init script could be crated to stop the default HDP Zeppelin and start the newer version
Apache Zeppelin Installation
Download Zeppelin: https://zeppelin.apache.org/download.html
Copy the .tar file tot he /tmp directory using WinSCP
Extract the .tar file in the target directory, i.e. opt
tar –xvf zeppelin-0.7.3-bin-all.tar -C /opt
Create a symlink to the last version (optional)
sudo ln –s zeppelin-0.7.3-bin-all/ zeppelin
Change the ownership of the folder
chown –R zeppelin:zeppelin /opt/zeppelin
Zeppelin Configuration
First copy the „conf“ directory from the existing zeppelin installation to the new version:
sudo yes | cp -rf /usr/hdp/current/zeppelin-server/conf/ /opt/zeppelin
In order to configure zeppelin to work with spark and spark2 client, the SPARK_HOME content needs to bind by the interpreter and comment out in the zeppelin-env.sh configuration file:
/opt/zeppelin/conf/zeppelin-env.sh

According to the documentation, the variable ZEPPELIN_JAVA_OPTS changed in spark2 to ZEPPELIN_INTP_JAVA_OPTS. Since both versions are active these two variables are defined:
export ZEPPELIN_JAVA_OPTS=“-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default”
export ZEPPELIN_INTP_JAVA_OPTS=“-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default”
Start zeppelin 0.7.3
su zeppelin /opt/zeppelin/bin/zeppelin-daemon.sh start
A pending issue here is to modifiy the startup scripts in order to persist the changes by a system reboot.
Configuring the spark interpreters
Navigate to the interpreter settings page:

Scroll-down to the spark interpreter and add the property:
SPARK_HOME = /usr/hdp/current/spark-client

Create a new interpreter with interpreter group spark and name it spark2
Add new interpreter

Interpreter name and group (leave all other values as default)

Add the property:
SPARK_HOME = /usr/hdp/current/spark2-client

Installation test
In order to test the installation create a new notebook and verify the binding of the interpreters

Execute the following code in two different paragraphs:
%spark sc.version
%spark2 sc.version

Leave a Reply