I haven't done a review on a big data toolset for a few months, instead I have been boring you with traffic and stock data. Well fear no more! Today I review Apache Zeppelin.
It's good to note that the project is under incubation with Apache right now, which means that it's going to be a little rough around the edges. The project looks very healthy with 110 contributors on github which gives me faith that it's here for the long term. WHAT IS ZEPPELIN? Do you work in data analysis? Do you have multiple complex simple and big data environments? Do you work in a team? Do you like Sharing? This is a great platform for analysts to share, document and report data. It supports a plethora of systems and languages: Python, R, Spark, Hadoop, Postgres, ElasticSearch, JDBC, Flat Files, Ignite, Flink, Cassandra, Hbase (and I'm sure many more to come). In their words : A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. HOW TO INSTALL IT: I'm installing Zeppelin on an Ubuntu 14.04 VM host, so you may need to adjust any code snippets based on your operating system of choice. Download the package, unpack and remove tgz file. At the time of this post, the latest stable version is 0.5.6. You might want to check to see if there's a newer version if you're following my steps. wget http://www-us.apache.org/dist/incubator/zeppelin/0.5.6-incubating/zeppelin-0.5.6-incubating.tgz
There's a bunch of dependencies, make sure you have the following:
For me I had an old version of Maven, I had to uninstall the old version and run the following to get it all up to date: wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
Configure Maven to get some more memory:
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"
Build the package, this takes some time! (make sure you're in the zeppelin dir)
mvn clean install -DskipTests
Start it up!
bin/zeppelin-daemon.sh start
EXPLORING ZEPPELIN
In your browser, go to "192.168.1.21:8080", replace my own IP with your host IP.
So where to from here? I have a couple of databases (postgres and mysql) setup on the same VM so we will now connect the postgres database.
In your browser, select the "Interpreter" tab and click the create button. (An interpreter is like setting up a new connection to a new environment")
Make sure you select the correct "interpreter" in the drop down, I selected psql for postgres. Add in the usual connection details. You will see that my JDBC URL is localhost, as it's installed on the same VM. One thing I'm not too keen on is the fact is once you save the "interpreter" it's happy to show your password to the world.
One thing which took me awhile to get the connection working with Postgres was the connection URL, if you don't specify the database you're gonna have a bad time. My connection URL ending up being "jdbc:postgresql://192.168.1.21:5432/needles" RUNNING OUR FIRST QUERY:
You need to specify the interpreter in the first line in my example I used %psql then my query.
GRAPHS Below I ran a simple aggregate to try out the graphing options.
The first result I got didn't show this! Both axis were showing numerical values, and you can see from my query that I am grouping by a date. To fix this I selected the "Settings" tab, re-arranged the values and keys (similar to excel pivot charts) to reflect what I wanted. Displayed below are more graphing options using the same query:
It's important to note that each notebook at the time of the blogpost can only have one interpreter, it's a known limitation in Zeppelin.
OTHER COOL THINGS
THINGS I WOULD LOVE
CONCLUSION! This is cool! $0 cost, connects to almost everything and it's only going to get better. Analytical teams need shared platforms that enable story telling and data discovery.
1 Comment
Karthik Sivadas
4/13/2016 04:01:02 am
Hi,
Reply
Leave a Reply. |
AuthorNew Zealand big data nerd, facial hair sculptor and classic car fanatic. Owner of needles.io, freelance big data consultant, ex Activision. Archives
April 2016
Categories |