Apache Zeppelin installation on Windows 10

Disclaimer: I am not a Windows or Microsoft fan, but I am a frequent Windows user and it’s the most common OS I found in the Enterprise everywhere. Therefore, I decided to try Apache Zeppelin on my Windows 10 laptop and share my experience with you. The behavior should be similar in other operating systems.

Introduction

It is not a secret that Apache Spark became a reference as a powerful cluster computing framework, especially useful for machine learning applications and big data processing. Applications could be written in several languages as Java, Scala, Python or R. Apache Zeppelin is a Web-based tool that tries to cover according to the official project Website all of our needs (Apache Zeppelin):

Data ingestion
Data discovery
Data analytics
Data visualization and collaboration

The interpreter concept is what makes Zeppelin powerful, because you can theoretically plug in any language/data-processing-backend. It provides built-in Spark integration, and that is what I have tested first.

Apache Zeppelin Download

You can download the latest release from this link: download

I downloaded the version 0.6.2 binary package with all interpreters.

Since this version, the Spark interpreter is compatible with Spark 2.0 and Scala 2.11

According to the documentation, it supports Oracle JDK 1.7 (I guess it should work with 1.8) and Mac OSX, Ubuntu 14.4, CentOS 6.X and Windows 7 pro SP1 (And according to my tests also with Windows 10 Home).

Too much bla bla bla, let’s get started.

Zeppelin Installation

After download open the file (I used 7 Zip) and extract it to a proper location (in my case just the c drive to avoid possible problems)

Set the JAVA_HOME system variable to your JDK bin folder.

Set the variable HADOOP_HOME to your Hadoop folder location. If you don’t have the HADOOP binaries you can download my binaries from here: Hadoop-2.7.1

My system variables

I am not really sure why Hadoop is needed if Zeppelin supposed to be autonomous but I guess Spark looks for the winutils.exe if you are using Windows. I posted about it in my previous post: Apache Spark Installation on Windows 10

This is the error I found in the Zeppelin logs (ZEPPELIN_DIR\logs –> there is a file for the server log and a separated file for each interpreter):

winutils.exe error

Zeppelin Configuration

There are several settings you can adjust. Basically, there are two main files in the ZEPPELIN_DIR\conf :

zeppelin-env
zeppelin-site.xml

In the first one you can configure some interpreter settings. In the second more aspects related to the Website, like for instance, the Zeppelin server port (I am using the 8080 but most probably yours is already used by another application)

If you don’t touch the zeppelin-env file, Zeppelin use the built-in Spark version, which it has been used for the results posted in this entry.

Start Zeppelin

Open a command prompt and start Zeppelin executing the zeppelin.cmd in Drive:\ZEPELLIN_DIR\bin\zeppelin.cmd

Start Zeppelin

Then, open your favorite browser and navigate to localhost:8080 (or the one you set in the zeppelin-site.xml)

You should see the starting page. Verify that the indicator in the top-right-side of the windows is green, otherwise your server is down or is not running properly)

Zeppelin home

If you have not configured Hive, before start trying the tutorials included in the release, you should need to set the value of the zeppelin.spark.useHiveContext to false. Apart from the config files, Zeppelin has an interpreter configuration page. You can find it by clicking on your user “anonymous” –> Interpreter

Go to interpreter settings

Scroll-down to the bottom where you’ll find the Spark config values:

Spark interpreter settings

Press on the edit button and change the value to false in order to use the SQL context instead of Hive.

Press the Save button to persist the change:

Set zeppelin.spark.useHiveContext to false

Now let’s try the Zeppelin Tutorial

From the Notebook menu click on the Zeppelin Tutorial link:

Navigate to the Zeppelin Tutorial

The first time you open it, Zeppelin ask you to set the Interpreter bindings:

Interpreter binding

Just scroll-down and save them:

Save biding

Some notes are presented with different layouts. For more about the display system visit the documentation online.

Other possible annoying error

I was getting the following error when tried to run some notes in the Zeppelin Tutorial:

Spark warehouse URI error

I found a suggested solution in the following stack overflow question: link

An URI syntax exception trying to find the folder spark-warehouse in the Zeppelin folder. I struggled a little bit with that. The folder was not created in my Zeppelin directory, I thought it was a permissions problem, so I created it manually and assigned 777 permissions.

spark-warehouse folder permission settings

It still failed. In the link above a forum user suggested to use triple slashes to define the proper path file:///C:/zeppelin-0.6.2-bin-all/spark-warehouse

But I still don’t know where to place this configuration. I couldn´t do it in the spark shell, also not while creating a spark session (zeppelin does it for me) and the conf/spark-defaults.conf doesn´t seem to be a good idea for Zeppelin because I was using the spark built-in version.

Finally, I remembered that is possible to add additional spark setting in the interpreter configuration page and I just navigated there and created it:

spark.sql.warehouse.dir

Just as additional info, you can verify the settings saved in this page in the file Drive:\ZEPELLIN_DIR\conf\interpreter.json

interpreter.json

After these steps, I was able to run all of the notes from the Zeppelin tutorials.

Running the load data into table note

Note that the layout from the tutorial is telling you more or less the order in which you have to execute the notes. The note “Load data into table” must be executed before you play the notes below. I guess that is the reason it spans over the whole width of the page, because it must be executed before you visualize or analyze the data, while the notes below could be executed in parallel, or in any order. I mean, this layout is not a must but it helps to keep an execution order.

Visualizing data with Zeppelin

I hope this helps you on your way to learn Zeppelin!

23 responses to “Apache Zeppelin installation on Windows 10”

Installing Zeppelin On Windows 10 – Curated SQL

November 14, 2016 at 2:15 pm

[…] Paul Hernandez shows how to install Apache Zeppelin on Windows 10: […]

Reply
Alex Ridden (@SashaRibben)

November 29, 2016 at 4:00 pm

This is so useful! I was struggling to get this running on Windows 10 – thanks for the comprehensive breakdown

Reply
kay kim

January 31, 2017 at 6:58 am

Thanks a lot. I’ll try~~

Reply
Phillip

January 31, 2017 at 7:25 pm

Thanks Paul for the wonderful post.

Reply
1. Paul Hernandez
  
  January 31, 2017 at 7:58 pm
  
  You’re welcome ☺
  
  Reply
Phillip

February 3, 2017 at 7:26 pm

Paul , i have installed the Windows version, however when the tutorials are run i get these messages. I have tried many fixes but nothing works.

Can you please advise.
Phillip

java.net.ConnectException: Connection refused: connect at java.net.DualStackPlainSocketImpl.connect0(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:189) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:163) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)

Reply
1. Paul Hernandez
  
  February 7, 2017 at 11:13 am
  
  Hi Phillip,
  I faced this problem when I tried to compile Zeppelin by myself. At the end I remember I tested different binaries until it worked but I could not tell you exactly the solution. Please try with the binary package of the last release and if it does not work try with the previous one. If you find the error and have time please share it with us. You can always check if Java is working properly in your environment.
  Kind regards, Paul.
  
  Reply
2. kv
  
  March 21, 2017 at 9:52 pm
  
  I am also having the similar issue. Can you please share the solution?
  
  Reply
3. umesh
  
  March 23, 2017 at 2:14 pm
  
  Hi Phillip,
  
  This error was appearing because (SPARK_HOME) was set in my machine environment variable.
  
  Removed all entries of SPARK_HOME from environment variable .
  Open new command prompt.
  Started zepplin.
  
  Possible reason could be is that zepplin invoke its own spark instance.
  
  Reply
umesh

March 23, 2017 at 4:55 am

i am using the same 0.6.2 full release on windows , but getting below error any idea how to fix it

F:\sparkSetup\zeppelin-0.7.0-bin-all\bin>zeppelin.cmd
Log dir doesn’t exist, create F:\sparkSetup\zeppelin-0.7.0-bin-all\logs
Pid dir doesn’t exist, create F:\sparkSetup\zeppelin-0.7.0-bin-all\run
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/F:/sparkSetup/zeppelin-0.7.0-bin-all/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder
.class]
SLF4J: Found binding in [jar:file:/F:/sparkSetup/zeppelin-0.7.0-bin-all/lib/zeppelin-interpreter-0.6.2.jar!/org/slf4j/impl/StaticLogger
Binder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Mar 23, 2017 8:19:40 AM com.sun.jersey.api.core.PackagesResourceConfig init
INFO: Scanning for root resource and provider classes in the packages:
org.apache.zeppelin.rest
Mar 23, 2017 8:19:40 AM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
class org.apache.zeppelin.rest.NotebookRestApi
class org.apache.zeppelin.rest.CredentialRestApi
class org.apache.zeppelin.rest.SecurityRestApi
class org.apache.zeppelin.rest.ConfigurationsRestApi
class org.apache.zeppelin.rest.InterpreterRestApi
class org.apache.zeppelin.rest.LoginRestApi
class org.apache.zeppelin.rest.ZeppelinRestApi
Mar 23, 2017 8:19:40 AM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
Mar 23, 2017 8:19:40 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version ‘Jersey: 1.13 06/29/2012 05:14 PM’
Mar 23, 2017 8:19:41 AM com.sun.jersey.spi.inject.Errors processErrorMessages
WARNING: The following warnings have been detected with resource and/or provider classes:
WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.InterpreterRestApi.listInterpreter(java.lang.St
ring), should not consume any entity.
WARNING: A sub-resource method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.NotebookRestApi.createNote(java.lang.String
) throws java.io.IOException, with URI template, “/”, is treated as a resource method
WARNING: A sub-resource method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.NotebookRestApi.getNotebookList() throws ja
va.io.IOException, with URI template, “/”, is treated as a resource method

Reply
1. umesh
  
  March 23, 2017 at 2:15 pm
  
  did some analysis on this. But seems like this warning has no impact on zepplin.
  Irrespective of this warning , i am able to run zepplin and run programs.
  
  Reply
arun

May 24, 2017 at 7:52 am

I am getting the following error, please help
HTTP ERROR: 503
Problem accessing /. Reason:
Service Unavailable
Powered by Jetty://

Reply
1. arun
  
  May 24, 2017 at 10:24 am
  
  I am getting same error in Ubuntu also…….
  
  Reply
  1. arun
    
    May 24, 2017 at 11:47 am
    
    I install old version as mentioned in this post and its working.
    The latest version available on Zeppelin website is not working on Windows10 and Ubuntu
Paul Hernandez

May 24, 2017 at 1:18 pm

Hi arun, my post is a little bit old and I haven’t tried the latest version. If you have further findings please share them with us.
Kind Regards, Paul

Reply
Nassir

July 7, 2017 at 11:26 pm

I have had difficulties installing Zeppelin 0.7.2

Using the Zeppelin version of spark that comes with it, I can run spark code, but I am unable to run %pyspark code even after modifying python environment variables to point to where python is installed (python was installed using anaconda).

%python code works fine.

If anyone can help resolve this issue I would be grateful. (The odd thing is I have done the same installation on another windows 10 laptop and pyspark does execute.)

Reply
1. Harris
  
  February 26, 2018 at 5:46 am
  
  Hi, are you able to solve this pyspark issue? If yes, then can you please guide me.
  
  Reply
Jay

October 23, 2017 at 9:14 am

Thank you~
I solved interpreter md find error.

Reply
Smitha Basavaraju

June 29, 2018 at 3:44 pm

i am getting prefix not found error while trying to run spark script with Glue development endpoint.. Any thoughts or help here

Reply
NGB

January 24, 2019 at 6:22 pm

Thank you for taking time to document this!

Reply
Ashley

February 16, 2019 at 7:26 am

Hi, it would be very helpful if you could do a youtube video on installation of zeppelin on windows 10. I am having difficulties in starting zeppelin using cmd. When typing bin\zeppelin.cmd, i am not getting anything on my cmd screen…what do i do? pls help….

Reply
1. Oliver Steadman
  
  March 8, 2019 at 3:13 pm
  
  Me too. After not getting any action from the .cmd file or the .sh file (I am on windows but tried both out of desperation!) I got nothing; tried the steps in this article about configuring JVM and HADOOP but not all I get is this:
  
  # my commmand
  PS C:\Users\steadmano> C:\Users\\Zeppelin\zeppelin-0.8.1-bin-all\bin\zeppelin.cmd -Verbose
  # output I get
  The system cannot find the path specified.
  
  Reply
2. Haejin Song
  
  June 23, 2020 at 6:12 am
  
  In that case, https://stackoverflow.com/questions/54666515/apache-zeppelin-zeppelin-cmd-shows-no-result would be able to help you
  
  Reply