Installing Apache Zeppelin 0.7.3 in HDP 2.5.3 with Spark and Spark2 Interpreters

Background

As a recent client requirement I needed to propose a solution in order to add spark2 as interpreter to zeppelin in HDP (Hortonworks Data Platform) 2.5.3
The first hurdle is, HDP 2.5.3 comes with zeppelin 0.6.0 which does not support spark2, which was included as a technical preview. Upgrade the HDP version was not an option due to the effort and platform availability. At the end I found in the HCC (Hortonworks Community Connection) a solution, which involves installing a standalone zeppelin which does not affect the Ambari managed zeppelin delivered with HDP 2.5.3.
I want to share how I did it with you.

Preliminary steps

Stop current Zeppelin: version 0.6.0 comes with HDP 2.5.3

su zeppelin
 /usr/hdp/current/zeppelin-server/bin/zeppelin-daemon.sh stop

Deactivate script that starts this version by a system reboot
Zeppelin is started as an Ambari dependency in the script

 usr/lib/hue/tools/start_deps.mf

In order to avoid a modification in this file a custom init script could be crated to stop the default HDP Zeppelin and start the newer version

Apache Zeppelin Installation

Download Zeppelin: https://zeppelin.apache.org/download.html
Copy the .tar file tot he /tmp directory using WinSCP
Extract the .tar file in the target directory, i.e. opt

tar –xvf zeppelin-0.7.3-bin-all.tar -C /opt

Create a symlink to the last version (optional)

sudo ln –s zeppelin-0.7.3-bin-all/ zeppelin

Change the ownership of the folder

chown –R zeppelin:zeppelin /opt/zeppelin

Zeppelin Configuration

First copy the „conf“ directory from the existing zeppelin installation to the new version:

sudo yes | cp -rf /usr/hdp/current/zeppelin-server/conf/ /opt/zeppelin

In order to configure zeppelin to work with spark and spark2 client, the SPARK_HOME content needs to bind by the interpreter and comment out in the zeppelin-env.sh configuration file:
/opt/zeppelin/conf/zeppelin-env.sh

edit zeppelin-env
zeppelin-env.sh

According to the documentation, the variable ZEPPELIN_JAVA_OPTS changed in spark2 to ZEPPELIN_INTP_JAVA_OPTS. Since both versions are active these two variables are defined:

export ZEPPELIN_JAVA_OPTS=“-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default”

export ZEPPELIN_INTP_JAVA_OPTS=“-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default”

Start zeppelin 0.7.3

su zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh start

A pending issue here is to modifiy the startup scripts in order to persist the changes by a system reboot.

Configuring the spark interpreters

Navigate to the interpreter settings page:

interpreter menu
Open Interpreter Menu

Scroll-down to the spark interpreter and add the property:

SPARK_HOME = /usr/hdp/current/spark-client
add property spark interpreter
Add SPARK_HOME property to the spark interpreter

Create a new interpreter with interpreter group spark and name it spark2

Add new interpreter

create new interpreter
Create a new interpreter

Interpreter name and group (leave all other values as default)

create spark2 interpreter
Set interpreter name and group

Add the property:

SPARK_HOME = /usr/hdp/current/spark2-client
add property spark2 interpreter
Add SPARK_HOME property to the spark2 interpreter

Installation test

In order to test the installation create a new notebook and verify the binding of the interpreters

interpreter binding
Interpreter binding for the test notebook

Execute the following code in two different paragraphs:

%spark

sc.version
%spark2

sc.version
spark2 test
Test notebook

References

2 responses to “Installing Apache Zeppelin 0.7.3 in HDP 2.5.3 with Spark and Spark2 Interpreters”

  1. […] Paul Hernandez shows how to install Apache Zeppelin 0.7.3 on Hortonworks Data Platform 2.5 in order …: […]

  2. After clean up some special chars caused by the minus sign(-). The guide works great. Thanks a lot Paul!

Leave a comment