Installing Apache Zeppelin 0.7.3 in HDP 2.5.3 with Spark and Spark2 Interpreters

Background

As a recent client requirement I needed to propose a solution in order to add spark2 as interpreter to zeppelin in HDP (Hortonworks Data Platform) 2.5.3
The first hurdle is, HDP 2.5.3 comes with zeppelin 0.6.0 which does not support spark2, which was included as a technical preview. Upgrade the HDP version was not an option due to the effort and platform availability. At the end I found in the HCC (Hortonworks Community Connection) a solution, which involves installing a standalone zeppelin which does not affect the Ambari managed zeppelin delivered with HDP 2.5.3.
I want to share how I did it with you.

Preliminary steps

Stop current Zeppelin: version 0.6.0 comes with HDP 2.5.3

su zeppelin
 /usr/hdp/current/zeppelin-server/bin/zeppelin-daemon.sh stop

Deactivate script that starts this version by a system reboot
Zeppelin is started as an Ambari dependency in the script

 usr/lib/hue/tools/start_deps.mf

In order to avoid a modification in this file a custom init script could be crated to stop the default HDP Zeppelin and start the newer version

Apache Zeppelin Installation

Download Zeppelin: https://zeppelin.apache.org/download.html
Copy the .tar file tot he /tmp directory using WinSCP
Extract the .tar file in the target directory, i.e. opt

tar –xvf zeppelin-0.7.3-bin-all.tar -C /opt

Create a symlink to the last version (optional)

sudo ln –s zeppelin-0.7.3-bin-all/ zeppelin

Change the ownership of the folder

chown –R zeppelin:zeppelin /opt/zeppelin

Zeppelin Configuration

First copy the „conf“ directory from the existing zeppelin installation to the new version:

sudo yes | cp -rf /usr/hdp/current/zeppelin-server/conf/ /opt/zeppelin

In order to configure zeppelin to work with spark and spark2 client, the SPARK_HOME content needs to bind by the interpreter and comment out in the zeppelin-env.sh configuration file:
/opt/zeppelin/conf/zeppelin-env.sh

edit zeppelin-env

zeppelin-env.sh

According to the documentation, the variable ZEPPELIN_JAVA_OPTS changed in spark2 to ZEPPELIN_INTP_JAVA_OPTS. Since both versions are active these two variables are defined:

export ZEPPELIN_JAVA_OPTS=“-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default”

export ZEPPELIN_INTP_JAVA_OPTS=“-Dhdp.version=None -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default”

Start zeppelin 0.7.3

su zeppelin
/opt/zeppelin/bin/zeppelin-daemon.sh start

A pending issue here is to modifiy the startup scripts in order to persist the changes by a system reboot.

Configuring the spark interpreters

Navigate to the interpreter settings page:

interpreter menu

Open Interpreter Menu

Scroll-down to the spark interpreter and add the property:

SPARK_HOME = /usr/hdp/current/spark-client
add property spark interpreter

Add SPARK_HOME property to the spark interpreter

Create a new interpreter with interpreter group spark and name it spark2

Add new interpreter

create new interpreter

Create a new interpreter

Interpreter name and group (leave all other values as default)

create spark2 interpreter

Set interpreter name and group

Add the property:

SPARK_HOME = /usr/hdp/current/spark2-client
add property spark2 interpreter

Add SPARK_HOME property to the spark2 interpreter

Installation test

In order to test the installation create a new notebook and verify the binding of the interpreters

interpreter binding

Interpreter binding for the test notebook

Execute the following code in two different paragraphs:

%spark

sc.version
%spark2

sc.version
spark2 test

Test notebook

References

Advertisements

About Paul Hernandez

I'm an Electronic Engineer and Computer Science professional, specialized in Data Analysis and Business Intelligence Solutions. Also a father, swimmer and music lover.
This entry was posted in Analytics, hadoop, Spark and tagged . Bookmark the permalink.

1 Response to Installing Apache Zeppelin 0.7.3 in HDP 2.5.3 with Spark and Spark2 Interpreters

  1. Pingback: Installing Zeppelin With Spark2 Support On HDP – Curated SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s