Apache Zeppelin installation on Windows 10

Disclaimer: I am not a Windows or Microsoft fan, but I am a frequent Windows user and it’s the most common OS I found in the Enterprise everywhere. Therefore, I decided to try Apache Zeppelin on my Windows 10 laptop and share my experience with you. The behavior should be similar in other operating systems.

Introduction

It is not a secret that Apache Spark became a reference as a powerful cluster computing framework, especially useful for machine learning applications and big data processing. Applications could be written in several languages as Java, Scala, Python or R. Apache Zeppelin is a Web-based tool that tries to cover according to the official project Website all of our needs (Apache Zeppelin):

  • Data ingestion
  • Data discovery
  • Data analytics
  • Data visualization and collaboration

The interpreter concept is what makes Zeppelin powerful, because you can theoretically plug in any language/data-processing-backend. It provides built-in Spark integration, and that is what I have tested first.

Apache Zeppelin Download

You can download the latest release from this link: download

I downloaded the version 0.6.2 binary package with all interpreters.

Since this version, the Spark interpreter is compatible with Spark 2.0 and Scala 2.11

According to the documentation, it supports Oracle JDK 1.7 (I guess it should work with 1.8) and Mac OSX, Ubuntu 14.4, CentOS 6.X and Windows 7 pro SP1 (And according to my tests also with Windows 10 Home).

Too much bla bla bla, let’s get started.

Zeppelin Installation

After download open the file (I used 7 Zip) and extract it to a proper location (in my case just the c drive to avoid possible problems)

Set the JAVA_HOME system variable to your JDK bin folder.

Set the variable HADOOP_HOME to your Hadoop folder location. If you don’t have the HADOOP binaries you can download my binaries from here: Hadoop-2.7.1

system-variables
My system variables

I am not really sure why Hadoop is needed if Zeppelin supposed to be autonomous but I guess Spark looks for the winutils.exe if you are using Windows. I posted about it in my previous post: Apache Spark Installation on Windows 10

This is the error I found in the Zeppelin logs (ZEPPELIN_DIR\logs –> there is a file for the server log and a separated file for each interpreter):

winutils error.JPG
winutils.exe error

Zeppelin Configuration

There are several settings you can adjust. Basically, there are two main files in the ZEPPELIN_DIR\conf :

  • zeppelin-env
  • zeppelin-site.xml

In the first one you can configure some interpreter settings. In the second more aspects related to the Website, like for instance, the Zeppelin server port (I am using the 8080 but most probably yours is already used by another application)

If you don’t touch the zeppelin-env file, Zeppelin use the built-in Spark version, which it has been used for the results posted in this entry.

Start Zeppelin

Open a command prompt and start Zeppelin executing the zeppelin.cmd in Drive:\ZEPELLIN_DIR\bin\zeppelin.cmd

start-zeppelin
Start Zeppelin

Then, open your favorite browser and navigate to localhost:8080 (or the one you set in the zeppelin-site.xml)

You should see the starting page. Verify that the indicator in the top-right-side of the windows is green, otherwise your server is down or is not running properly)

zeppelin home.JPG
Zeppelin home

If you have not configured Hive, before start trying the tutorials included in the release, you should need to set the value of the zeppelin.spark.useHiveContext to false. Apart from the config files, Zeppelin has an interpreter configuration page. You can find it by clicking on your user “anonymous” –> Interpreter

interpreter-config
Go to interpreter settings

Scroll-down to the bottom where you’ll find the Spark config values:

spark interpreter properties.JPG
Spark interpreter settings

Press on the edit button and change the value to false in order to use the SQL context instead of Hive.

Press the Save button to persist the change:

hive-content-set-to-false
Set zeppelin.spark.useHiveContext to false

Now let’s try the Zeppelin Tutorial

From the Notebook menu click on the Zeppelin Tutorial link:

zeppelin-tutorial
Navigate to the Zeppelin Tutorial

The first time you open it, Zeppelin ask you to set the Interpreter bindings:

interpreter bindings 1.JPG
Interpreter binding

Just scroll-down and save them:

interpreter-bindings-2
Save biding

Some notes are presented with different layouts. For more about the display system visit the documentation online.

Other possible annoying error

I was getting the following error when tried to run some notes in the Zeppelin Tutorial:

spark-warehouse folder 2.JPG
Spark warehouse URI error

I found a suggested solution in the following stack overflow question: link

An URI syntax exception trying to find the folder spark-warehouse in the Zeppelin folder. I struggled a little bit with that. The folder was not created in my Zeppelin directory, I thought it was a permissions problem, so I created it manually and assigned 777 permissions.

spark-warehouse-folder
spark-warehouse folder permission settings

It still failed. In the link above a forum user suggested to use triple slashes to define the proper path file:///C:/zeppelin-0.6.2-bin-all/spark-warehouse

But I still don’t know where to place this configuration. I couldn´t do it in the spark shell, also not while creating a spark session (zeppelin does it for me) and the conf/spark-defaults.conf doesn´t seem to be a good idea for Zeppelin because I was using the spark built-in version.

Finally, I remembered that is possible to add additional spark setting in the interpreter configuration page and I just navigated there and created it:

warehouse-dir
spark.sql.warehouse.dir

Just as additional info, you can verify the settings saved in this page in the file Drive:\ZEPELLIN_DIR\conf\interpreter.json

spark-warehouse folder 3.JPG
interpreter.json

After these steps, I was able to run all of the notes from the Zeppelin tutorials.

running-notes-zeppelin-tutorial
Running the load data into table note

Note that the layout from the tutorial is telling you more or less the order in which you have to execute the notes. The note “Load data into table” must be executed before you play the notes below. I guess that is the reason it spans over the whole width of the page, because it must be executed before you visualize or analyze the data, while the notes below could be executed in parallel, or in any order. I mean, this layout is not a must but it helps to keep an execution order.

note reults.JPG
Visualizing data with Zeppelin

I hope this helps you on your way to learn Zeppelin!

23 responses to “Apache Zeppelin installation on Windows 10”

  1. […] Paul Hernandez shows how to install Apache Zeppelin on Windows 10: […]

  2. This is so useful! I was struggling to get this running on Windows 10 – thanks for the comprehensive breakdown

  3. Thanks a lot. I’ll try~~

  4. Thanks Paul for the wonderful post.

  5. Paul , i have installed the Windows version, however when the tutorials are run i get these messages. I have tried many fixes but nothing works.

    Can you please advise.
    Phillip

    java.net.ConnectException: Connection refused: connect at java.net.DualStackPlainSocketImpl.connect0(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:189) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:163) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:328) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)

    1. Hi Phillip,
      I faced this problem when I tried to compile Zeppelin by myself. At the end I remember I tested different binaries until it worked but I could not tell you exactly the solution. Please try with the binary package of the last release and if it does not work try with the previous one. If you find the error and have time please share it with us. You can always check if Java is working properly in your environment.
      Kind regards, Paul.

    2. I am also having the similar issue. Can you please share the solution?

    3. Hi Phillip,

      This error was appearing because (SPARK_HOME) was set in my machine environment variable.

      Removed all entries of SPARK_HOME from environment variable .
      Open new command prompt.
      Started zepplin.

      Possible reason could be is that zepplin invoke its own spark instance.

  6. i am using the same 0.6.2 full release on windows , but getting below error any idea how to fix it

    F:\sparkSetup\zeppelin-0.7.0-bin-all\bin>zeppelin.cmd
    Log dir doesn’t exist, create F:\sparkSetup\zeppelin-0.7.0-bin-all\logs
    Pid dir doesn’t exist, create F:\sparkSetup\zeppelin-0.7.0-bin-all\run
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/F:/sparkSetup/zeppelin-0.7.0-bin-all/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder
    .class]
    SLF4J: Found binding in [jar:file:/F:/sparkSetup/zeppelin-0.7.0-bin-all/lib/zeppelin-interpreter-0.6.2.jar!/org/slf4j/impl/StaticLogger
    Binder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    Mar 23, 2017 8:19:40 AM com.sun.jersey.api.core.PackagesResourceConfig init
    INFO: Scanning for root resource and provider classes in the packages:
    org.apache.zeppelin.rest
    Mar 23, 2017 8:19:40 AM com.sun.jersey.api.core.ScanningResourceConfig logClasses
    INFO: Root resource classes found:
    class org.apache.zeppelin.rest.NotebookRestApi
    class org.apache.zeppelin.rest.CredentialRestApi
    class org.apache.zeppelin.rest.SecurityRestApi
    class org.apache.zeppelin.rest.ConfigurationsRestApi
    class org.apache.zeppelin.rest.InterpreterRestApi
    class org.apache.zeppelin.rest.LoginRestApi
    class org.apache.zeppelin.rest.ZeppelinRestApi
    Mar 23, 2017 8:19:40 AM com.sun.jersey.api.core.ScanningResourceConfig init
    INFO: No provider classes found.
    Mar 23, 2017 8:19:40 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
    INFO: Initiating Jersey application, version ‘Jersey: 1.13 06/29/2012 05:14 PM’
    Mar 23, 2017 8:19:41 AM com.sun.jersey.spi.inject.Errors processErrorMessages
    WARNING: The following warnings have been detected with resource and/or provider classes:
    WARNING: A HTTP GET method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.InterpreterRestApi.listInterpreter(java.lang.St
    ring), should not consume any entity.
    WARNING: A sub-resource method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.NotebookRestApi.createNote(java.lang.String
    ) throws java.io.IOException, with URI template, “/”, is treated as a resource method
    WARNING: A sub-resource method, public javax.ws.rs.core.Response org.apache.zeppelin.rest.NotebookRestApi.getNotebookList() throws ja
    va.io.IOException, with URI template, “/”, is treated as a resource method

    1. did some analysis on this. But seems like this warning has no impact on zepplin.
      Irrespective of this warning , i am able to run zepplin and run programs.

  7. I am getting the following error, please help
    HTTP ERROR: 503
    Problem accessing /. Reason:
    Service Unavailable
    Powered by Jetty://

    1. I am getting same error in Ubuntu also…….

      1. I install old version as mentioned in this post and its working.
        The latest version available on Zeppelin website is not working on Windows10 and Ubuntu

  8. Hi arun, my post is a little bit old and I haven’t tried the latest version. If you have further findings please share them with us.
    Kind Regards, Paul

  9. I have had difficulties installing Zeppelin 0.7.2

    Using the Zeppelin version of spark that comes with it, I can run spark code, but I am unable to run %pyspark code even after modifying python environment variables to point to where python is installed (python was installed using anaconda).

    %python code works fine.

    If anyone can help resolve this issue I would be grateful. (The odd thing is I have done the same installation on another windows 10 laptop and pyspark does execute.)

    1. Hi, are you able to solve this pyspark issue? If yes, then can you please guide me.

  10. Thank you~
    I solved interpreter md find error.

  11. Smitha Basavaraju Avatar
    Smitha Basavaraju

    i am getting prefix not found error while trying to run spark script with Glue development endpoint.. Any thoughts or help here

  12. Thank you for taking time to document this!

  13. Hi, it would be very helpful if you could do a youtube video on installation of zeppelin on windows 10. I am having difficulties in starting zeppelin using cmd. When typing bin\zeppelin.cmd, i am not getting anything on my cmd screen…what do i do? pls help….

    1. Me too. After not getting any action from the .cmd file or the .sh file (I am on windows but tried both out of desperation!) I got nothing; tried the steps in this article about configuring JVM and HADOOP but not all I get is this:

      # my commmand
      PS C:\Users\steadmano> C:\Users\\Zeppelin\zeppelin-0.8.1-bin-all\bin\zeppelin.cmd -Verbose
      # output I get
      The system cannot find the path specified.

Leave a comment