Introduction to Elasticsearch on Azure

Introduction to
elasticsearch on Microsoft Azure,
for the
Microsoft Azure Meetup Group
Chris Morley (@depahelix)
Microsoft NERD Center, Cambridge, MA
February 20, 2014
Agenda – 10 minute lightning sections
• elasticsearch general introduction
• create an elasticsearch node in azure
• add some data and search it
• create a web front end for it
• create a Windows 8 front end for it
• scale out with Azure plugin for elasticsearch
• If there is time – look at other plugins, plus a bit of Q & A
• show you how to get setup to use elasticsearch at a very basic level
• give a general high level overview of practical plumbing
• get you rolling so you can start to evaluate ES, for real, for whatever
you might be thinking you could use ES for
• explain why the new plugin for Azure is a good start, but needs work
• convince you that elasticsearch is generally pretty cool
• give you a “sense” of what’s going on
Not here to…
• “sell you” on using elasticsearch
• convince you that you must use elasticsearch, or else
• it’s really like any other technology – there are alternatives out there
• demonstrate the true power and flexibility and capability of
elasticsearch (not enough time to go through all that)
• Not a “big data” demo, by any stretch.
• drill down into minute details or get bogged down on specifics
• I am just a user, not a sales agent
What is elasticsearch?
• In short, it can be thought of as “search engine software”
• It provides the realistic potential for you to run your own search engine
service (like a Bing or a Google) but with say, private, sensitive, or
confidential data/documents that you don’t want on the public web
• great extra capability for your company, enterprise, app, startup, client
• elasticsearch is an open-source, distributed web application that runs on
top of Lucene, and it is written in Java, and it sports a REST API
• Apache Lucene is the best open-source search engine, and probably one of
the best search engines available, and holds its own even when compared
against the most expensive commercial alternatives
• very fast search
Where did elasticsearch come from?
• Originally there was a search application project called Apache
Compass, which was primarily worked on by @kimchy
• Compass also relied on Lucene, but was not distributed
• kimchy decided to write elasticsearch to be distributed from the get
go, and so you could say it was built with the cloud in mind
• Add more servers and they play together nicely, and they know how
to work together to split up the work load (and search queries can be
resource intensive and expensive in terms of memory/disk
Why do I know so much about elasticsearch?
(didn’t it just come out?)
• I help support an implementation for work
• We bought a company which was an early adopter/beta site, and it
was setup a while ago, with help from elasticsearch people
• We built a new implementation somewhat based on that earlier
• I maintain and add on to that implementation
• I attended the elasticsearch 2 day training in NYC this past September
• (which I highly recommend)
• I worked on a Solr project for about 9 months a couple years ago
elasticsearch is an advanced distributed app
• It has some very cool properties and abilities when it comes to
operations that involve lots of nodes
• It scales extremely gracefully
• It has its own optimized binary protocol and makes its own “internal
• …as long as you know what you are doing when it comes to
• It is open source
What elasticsearch is Not (1 of 3)
• It is NOT safe as a primary persistent data store
• Meaning – you should not trust it as a “system of record”
• Always be prepared to reload from scratch, in case of data corruption
• “Don't let yourself get attached to anything you are not willing to walk out on
in 30 seconds flat if you feel the heat around the corner.” -Neil McCauley
• Although Neil’s to-a-fault discipline doesn’t apply to everything in life,
elasticsearch is one of those things that it actually works well if you apply that
philosophy: always be ready to drop and reload your data if something goes
horribly wrong in the future
What elasticsearch is Not (2 of 3)
• It is NOT secure (at this time)
• Even though it is a nice wrapper around Lucene, it, itself still needs to be
wrapped and hardened against direct traffic from the Internet, in basic ways,
usually with a proxy
• Security has not been a focus, and that has been a design decision
• You have been warned!
What elasticsearch is Not (3 of 3)
• It’s not extremely well documented
• There is a lot of documentation, but it is sometimes difficult to
parse/read the sentences due to grammatical errors, etc.
• Plus there is a lot of jargon when you start talking about analyzers,
etc. You have to do a lot of research to make use of what
documentation there is.
• If you want to really learn a lot, go to a 2 day seminar (It’s $1800.00)
Why do you need elasticsearch if you have
Solr working already?
• OK, so elasticsearch is very much going to be an alternative to Solr
• It is distributed from the get go. ES is distributed at its core. (“shard”)
• SolrCloud gets Solr to act more like elasticsearch
• Solr is more XML based, but can serve JSON too
• Elasticsearch is more JSON based, with configuration in simple .yml
• Short answer: there may be no compelling reason to do an expensive
migration off of Solr to elasticsearch, but if you are starting a brand
new project, consider elasticsearch. It’s cooler and it does more
Each machine is a node in the cluster
• You’ve heard this terminology before if you have used Hadoop,
Zookeeper, or any number of other distributed systems
• Nodes can have “types” (master, data, client, and tribe)
• Data nodes need disk and memory
• Client nodes need memory
• Master nodes need stability and to not be “stressed out” or “upset”
• The Tribe node (if you create one) is sort of a MasterMaster node
The simplest cluster: one node
• It’s the master
• It’s the data node
• It’s the client node
• There is no tribe node
• Let’s set one of these up in Azure…
Let’s make a Linux box in Azure
• Login to the Windows Azure management console
• If you don’t already have a subscription, Google for “try Azure 90 days”
• Go to Virtual Machines and click on New
• From Gallery > CentOS > OpenLogic > A1 > Small > East, w/password
• Open up endpoint security on port 9200 (elastic/9200/9200)
• SSH to the machine using cygwin (or PuTTY, or whatever you like best)
Components to install
• elasticsearch 0.90.x , currently 0.90.10 – available from
• There is an elasticsearch 1.0 version and you are welcome to try that
instead if you prefer
Components to Install (only getting
underlined for this demonstration)
• wget, curl
• elasticsearch plugins:
azure plugin
elasticsearch-service wrapper
Many more to check out
• other things to get, as you need: mysql connector, etc.
Install and Run Elasticsearch
• (+switch subscriptions)
• create an elasticsearch node in azure
Before I began
• Created an extrasmall VM
• Installed Node.js
• Installed the cross-platform CLI tool for Azure
elasticsearch starts up
• the node gets a random name from a list
• it is started in the foreground right now for our simple demo
purposes, but normally you would want to install it as a service
• go to
• http://azure-elasticsearch-cluster?
• make sure there is a response.
Add some data and search it (experiment)
• Let’s load some data in and do a search on that data.
• Run Experiment
• Show scripts, show output
• Show output files JamesTaylor.txt vs. TaylorJames.txt
Front ends
• create an Azure Websites web front end for it (1 html file, 1 js file)
• create a Windows 8 front end for it
• …>
Add a simple web front end using jQuery
• Go here:
• Download the Zip file
• Grab elasticsearch.jquery.js
• Make an index.html file, write it up
Create an Azure Website
FTP index.html and elasticsearch.jquery.js up to the site wwwroot
Make a simple Windows 8 app and hook it up
• Run VS2013 as Admin
• File > New > Project > Windows Store
• “Blank App”, App1, c:\users\chris\documents\visual studio
• Project > Manage NuGet Packages, add JSON.NET, MicroMVVM
• Add textbox, and a button. Add a stackpanel.
• Double click button.
• (Open Desktop/presentation/App1)
Demo Azure Plugin
• ./
• Configure Endpoints manually: 9200 load balanced.
• Still to do:
automate endpoints being opened
attach listeners/pollers to process that can spin up new nodes
keep the running list of nodes somewhere (like an Azure Table, maybe)
differentiate the nodes (master, client, data, tribe)
use chef or puppet?
Links and stuff
Includes links and stuff for this talk. Check the space in case I fix errata
or give more talks.
• Chris Morley
• [email protected]
• Twitter: @depahelix