HYBRIS – CLOUD - BIGDATA V1.0 19/11/2014 Yassine MEJRI HYBRIS-CLOUD-BIGDATA Agenda Cloud Windows Azure Deploying Hybris on Windows Azure Elasticsearch Kibana Use cases : Analytics, Machine learning. CLOUD Cloud Computing A standardised IT capability (services, software or infrastructure) delivered via internet technologies in a pay-per-use, self-service way Cloud services are shared services, under virtualised management, accessible over the internet A style of computing where massively scalable IT-related capabilities are provided “as a service” using internet technologies to multiple external customers CLOUD History 1960 : John McCarthy’s Concept “Computation may someday be organized as a public utility." 1999 : Salesforce.com “Pioneered the concept of delivering enterprise applications via a simple website” 2000 : Microsoft 2001 : IBM “Expanded Sass Concept through web service” 2005 : Amazon “Launch of Amazon web services” 2007 : Google and IBM “Start researching Cloud Computing” 2008 : Gartner Research “Start using Cloud Computing in many organization” CLOUD Cloud computing providers http://www.cloudscreener.com/ CLOUD WINDOWS AZURE WINDOWS AZURE WINDOWS AZURE LAYERS WINDOWS AZURE Cloud service model WINDOWS AZURE Geo-location Datacenter West US East US South Central US WINDOWS AZURE Building and running apps WINDOWS AZURE Building and running apps Windows Azure Blob Storage WINDOWS AZURE BLOB STORAGE Architecture Azure Blob storage is a service for storing large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. Common uses of Blob storage include: Serving images or documents directly to a browser Storing files for distributed access Streaming video and audio Performing secure backup and disaster recovery WINDOWS AZURE BLOB STORAGE Java API Connexion String : public static final String storageConnectionString = "DefaultEndpointsProtocol=http;" + "AccountName=your_storage_account;" + "AccountKey=your_storage_account_key"; Create container : CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConnectionString); CloudBlobClient blobClient = storageAccount.createCloudBlobClient(); CloudBlobContainer container = blobClient.getContainerReference("images"); container.createIfNotExists(); WINDOWS AZURE BLOB STORAGE Java API Change permissions : BlobContainerPermissions containerPermissions = new BlobContainerPermissions(); containerPermissions.setPublicAccess(BlobContainerPublicAccessType.CONTAINER); container.uploadPermissions(containerPermissions); Upload blob : final String filePath = "C:\\myimages\\myimage.jpg"; CloudBlockBlob blob = container.getBlockBlobReference("myimage.jpg"); File source = new File(filePath); blob.upload(new FileInputStream(source), source.length()); Download blob : for (ListBlobItem blobItem : container.listBlobs()) { if (blobItem instanceof CloudBlob) { CloudBlob blob = (CloudBlob) blobItem; blob.download(new FileOutputStream("C:\\mydownloads\\" + blob.getName())); } } WINDOWS AZURE BLOB STORAGE Tables NoSQL WINDOWS AZURE BLOB STORAGE Queue CLOUD Windows Azure Management Console CLOUD Azure SDK : Powershell, Node.js, Java … Windows azure SDK : Import-AzurePublishSettingsFile -PublishSettingsFile "full path to downloaded file“ New-AzureAffinityGroup -Name pslab-group -Location "East US“ New-AzureQuickVM -ImageName $VMImage -Windows -Name $myVMName -ServiceName $myVMName AdminUsername $myAdminName -Password $myAdminPwd -AffinityGroup pslab-grou Stop-AzureVM -Name $myVMName -ServiceName $myVMName Start-AzureVM -Name $myVMName -ServiceName $myVMName Restart-AzureVM -Name $myVMName -ServiceName $myVMName HYBRIS Deploy Hybris Use Case : Deploying Hybris on Windows Azure DEPLOY HYBRIS Architecture : auto-scalable horizontal and vertical HTTP/HTTPS N1 VIP : windows Azure Load Balancer (Failover, Round Robin, Performance) N2 Cloud Service F.O Ni N1 N2 CDN Ni Cloud Service B.O Azure Blob Storage : Medias, Files, Attachements, orders.pdf… AZURE SQL SERVER HYBRIS-CLOUD Azure cloud Extension Windows Azure Blob provides a simple web services interface that can be used to store and retrieve any amount of data. You can configure a specific MediaFolder to store binary data of a Media item directly in Windows Azure Blob. To configure your folder to use Windows Azure Blob you need to have: Windows Azure account Properly created Access Keys For more details read http://www.windowsazure.com/en-us/develop/net/how-to-guides/blob-storage/. HYBRIS-CLOUD Azure cloud Extension HYBRIS-CLOUD Azure cloud Extension https://wiki.hybris.com/display/release5/Using+Windows+Azure+Blob+Media+Storage+Strategy 1. Import extension : azurecloud 2. Configure blob storage in local.properties: Global settings : media.globalSettings.accountKey= media.globalSettings.accountName= media.globalSettings.connection=UseDevelopmentStorage\=True media.globalSettings.endPointProtocol=http media.globalSettings.local.cache=true media.globalSettings.public.base.url=http://127.0.0.1:10000/devstoreaccount1 media.globalSettings.secured=true media.globalSettings.storage.strategy=windowsAzureBlobStorageStrategy media.globalSettings.url.strategy=windowsAzureBlobURLStrategy HYBRIS-CLOUD Azure cloud Extension 3. How to create new blob storage folder : …….. media.folder.invoices.accountKey= media.folder.invoices.accountName= media.folder.invoices.connection=UseDevelopmentStorage\=True media.folder.invoices.endPointProtocol=http media.folder.invoices.local.cache=true media.folder.invoices.public.base.url=http://127.0.0.1:10000/devstoreaccount1 media.folder.invoices.secured=true media.folder.invoices.storage.strategy=windowsAzureBlobStorageStrategy media.folder.invoices.url.strategy=windowsAzureBlobURLStrategy …….. HYBRIS-CLOUD Azure cloud Extension 4. Storing Media Files : final MediaModel media = modelService.create(MediaModel.class); media.setCatalogVersion(catalogVersionService.getCatalogVersion("productCatalog" , "Staged")); final MediaFolderModel folder = mediaService.getFolder("invoices"); media.setFolder(folder); mediaService.save(media); HYBRIS-CLOUD Secure media access HYBRIS-CLOUD Secure media access You can enable secure media access for specific Media folder by putting in your local.properties file the following property set to true: media.folder.<folderName>.secured=true It means that only secure URL will be rendered for each Media item stored in these folders. It also means that access to these medias will be filtered only by the SecureMediaFilter. Managing Permissions : Use the MediaPermissionService Using hMC You can grant or deny access to a Media item for a give principal by opening specific Media item and going to Security tab. Using ImpEx Below you can find the example of an ImpEx import script for granting access to a Media item with code 1017895.jpg for the editor principal: INSERT_UPDATE media; code[unique=true]; catalogVersion(catalog(id),version)[unique=true]; permittedPrincipals(uid);;1017895.jpg; clothescatalog:Staged;editor; HYBRIS-CLOUD Azure cloud Extension http://hybrisazure.blob.core.windows.net/hybris/sys_master/root/h3e/hd7/8796157378590.jpg Initialze or Update Hybris : Keep in mind that even if name of custom container is myContainer, then prefix with tenantId is added automatically, so finally container name is sys-master-myContainer. The pattern is sys-<tenantID>-<containerName>. To control cleaning Windows Azure storage on fresh initialization use following global property: media.globalSettings.windowsAzureBlobStorageStrategy.cleanOnInit={true or false} DEPLOY HYBRIS Azure Cloud Service ? HTTP/HTTPS N1 VIP : windows Azure Load Balancer (Failover, Round Robin, Performance) N2 Cloud Service F.O Ni N1 N2 CDN Ni Cloud Service B.O Azure Blob Storage : Medias, Files, Attachements, orders.pdf… AZURE SQL SERVER DEPLOY HYBRIS AzureRunMe DEPLOY HYBRIS Packaging and Deploy Hybris Service Definition (*.csdef) Service Configuration (*.cscfg) Encrypted(Zipped(Code + *.csdef)) == *.cspkg DEPLOY HYBRIS Devops : Azure PowerShell cmdlets # import Azure dll $env:PSModulePath=$env:PSModulePath+";"+"C:\Program Files (x86)\Microsoft SDKs\Windows Azure\PowerShell Import-Module Azure # Connexion Import-AzurePublishSettingsFile $pubsettings Select-AzureSubscription -SubscriptionName $selectedsubscription Set-AzureSubscription -CurrentStorageAccount $storageAccountName -SubscriptionName $selectedsubscription # Create New deployement $opstat = New-AzureDeployment -Slot $slot -Package $packageLocation -Configuration $cloudConfigLocation -label $deploymentLabel -ServiceName $serviceName # Upgrade deployement $setdeployment = Set-AzureDeployment -Upgrade -Slot $slot -Package $packageLocation -Configuration $cloudConfigLocation -label $deploymentLabel -ServiceName $serviceName -Force # swap deployment, staging production Move-AzureDeployment -ServiceName $serviceName HYBRIS-CLOUD AzureRunMe Demo : AzureRunMe and Windows Azure Emulator ELASTICSEARCH Elasticsearch ELASTICSEARCH Elasticsearch https://github.com/elasticsearch 1. Java 2. Apache Lucene 3. Plug and play 4. Document Oriented 5. Scalable 6. Clustering 7. Lucene 8. Sharding and replication 9. REST/ JSON Client 10. Apache2 license ELASICSEARCH SQL VS ES ELASTICSEARCH Architecture Cluster Document à indexer Node 1 (Master) a0 b2 Route b1 Node 2 a1 Instance ElasticSearch b3 b0 b2 Instance ElasticSearch Réplication Node 3 b0 Node 4 b4 a0 b3 Instance ElasticSearch Shard primaire b1 a1 b4 Instance ElasticSearch Shard répliqué a et b = index ELASTICSEARCH Mapping fields types Core types : String, Integer, Long , Double, Boolean, Date, Binary …. IP type : "address" : { "type" : "ip", "store" : "yes" } { "name" : "Tom PC", "address" : "192.168.2.123" } Geo point type : "location" : { "type" : "geo_point"} Attachement type : "my_attachment" : { "type" : "attachment" } Token count type : The token_count field type allows us to store index information about how many words the given field has instead of storing and indexing the text provided to the field. "address_count" : { "type" : "token_count", "store" : "yes" } ELASTICSEARCH Mapping fields types Object types : JSON documents are hierarchical in nature, allowing them to define inner "objects" within the actual JSON. "tweet" : { "properties" : { "person" : { "type" : "object", "properties" : { "name" : { "type" : "object", "properties" : { "first_name" : {"type" : "string"}, "last_name" : {"type" : "string"} } }, "sid" : {"type" : "string", "index" : "not_analyzed"} } }, "message" : {"type" : "string"} } } ELASTICSEARCH Mapping fields types Nested Types : The nested type works like the object type except that an array of objects is flattened, while an array of nested objects allows each object to be queried independently. To explain, consider this document: Mapping : { "type1" : { "properties" : { "users" : { "type" : "nested", "properties": { "first" : {"type": "string" }, "last" : {"type": "string" } } } } } } ELASTICSEARCH Mapping fields types Array types : JSON documents allow to define an array (list) of fields or objects. "Product" : [ { "id" : 12 "title" : "iphone", "categories" : [1,3,5,7], "tag" : ["iphone4", "iphone5","iphone6"], "author" : [ { "firstname" : "Francois", "lastname": "francoisg", "id" : 18 }, { "firstname" : "Gregory", "lastname" : "gregquat" "id" : "2" } ]}} ELASTICSEARCH Relationnel vs denormalize ELASTICSEARCH Relationnel vs denormalize "translation" : { "any_empty" : { "type" : "boolean", "index" : "not_analyzed" }, "_routing" : { "all_empty" : { "type" : "boolean", "index" : "not_analyzed" }, "required" : true, "status" : { "type" : "string", "index" : "not_analyzed" }, "path" : "project_id" "phrases" : { }, "_id" : { "_id" : { "path" : "id" "path" : "id" }, }, "type" : "nested", "_all" : { "properties" : { "enabled" : "false" "id" : { "type" : "string", "index" : "not_analyzed" }, }, "iso2_lang" : { "type" : "string", "index" : "not_analyzed" }, "dynamic" : "strict", "content" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_standard" }, "properties" : { "content_cs" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_cs" }, "id" : { "type" : "string", "index" : "not_analyzed" }, "created_at" : { "type" : "long", "index" : "not_analyzed" }, "public_id" : { "type" : "integer", "index" : "not_analyzed" }, "updated_at" : { "type" : "long", "index" : "not_analyzed" }, "project_id" : { "type" : "string", "index" : "not_analyzed" }, "status" : { "type" : "string", "index" : "not_analyzed" } "title_na" : { "type" : "string", "index" : "not_analyzed" }, } "title" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_standard" }, } "title_cs" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_cs" }, } "description" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_standard" }, } "description_cs" : { "type" : "string", "index" : "analyzed", "analyzer" : "trans_cs" }, "resource_file_id" : { "type" : "integer", "index" : "not_analyzed" }, "created_at" : { "type" : "long", "index" : "not_analyzed" }, "updated_at" : { "type" : "long", "index" : "not_analyzed" }, ELASTICSEARCH Elasticsearch : CRUD Insert Data: $ cat data.json { "index" : { "_index" : "requests" , "_type" : "request" , "_id" : 33 } } { "client" : "client1" , "country" : "FR" , "id" : 1, "ip" : "100.1.1.3", "password" : "test" , "sensor" : "test" , "session" : "EFRFR34344" , "success" : "OK" ,"timestamp" : "1414183085848", "username" : "test" } $ curl -XPOST http://localhost:9200/requests -d @data.json Update : $curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{ "doc" : { "name" : "new_name" } }'} }‘ Delete : $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1‘ ELASTICSEARCH Query DSL $ curl -XPOST http://localhost:9200/_search?<YOUR_QUERY> Type Exemple Terms Apple iphone Phrases « apple iphone» Proximity « apple iphone»~5 Fuzzy Apple~5 Wilcards App* Boosting Apple^10 safari Range [2014/05/01 To 2014/05/30] Boolean apple AND NOT iphone Fields Title:iphone^5 ELASTICSEARCH Query DSL 'http://localhost:9200/requests/_search?pretty' -d '{ } ] "query": { } "filtered": { }, "query": { "filter": { "bool": { "bool": { "should": [ "must": [ { { "query_string": { "match_all": {} "query": "marketing.cars >100" }, { } "exists": { }, "field": "location" { } "query_string": { } "query": "marketing.music > 100" ] } } }, } { } "query_string": { "query": "marketing.electronics > 00" }, "fields": [ "location", } "remoteAddr" }, ], { "size": 1000 "query_string": { "query": "marketing.fashion > 100" } }' ELASTICSEARCH MULTI SEARCH API SearchRequestBuilder requestOne = node.client() .prepareSearch().setQuery(QueryBuilders.matchQuery("name", "test1")).setSize(1); SearchRequestBuilder requestTwo = node.client() .prepareSearch().setQuery(QueryBuilders.matchQuery("name", "test2")).setSize(1); MultiSearchResponse response = node.client().prepareMultiSearch() .add(requestOne ) .add(requestTwo ) .execute().actionGet(); // You will get all individual responses from MultiSearchResponse#getResponses() long nbHits = 0; for (MultiSearchResponse.Item item : sr.getResponses()) { SearchResponse response = item.getResponse(); nbHits += response.getHits().getTotalHits(); } ELASTICSEARCH Bulk API The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed. Example $ cat requests { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } } { "field1" : "value1" } $ curl -s -XPOST localhost:9200/_bulk --data-binary @requests; echo {"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1}}]} ELASTICSEARCH Aggregations The following snippet captures the basic structure of aggregations: "aggregations" : { "<aggregation_name>" : { "<aggregation_type>" : { <aggregation_body> } [,"aggregations" : { [<sub_aggregation>]+ } ]? } [,"<aggregation_name_2>" : { ... } ]* } ELASTICSEARCH Aggregations : min, max, avg Metric aggregations : Min, max, sum, and avg { { "aggs": { "aggs": { "min_year": { "min_year": { "min": { "min": { "field": "year", "field": "year" "script": "_value - mod", } "params": { } "mod" : 1000 } } } Example : "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } "aggs" : { "min_price" : { "min" : { "field" : "price" } } } "aggs" : { "max_price" : { "max" : { "field" : "price" } } } "aggs" : { "sum_price" : { "sum" : { "field" : "price" } } } } ELASTICSEARCH Aggregations : The terms aggregation Request : Response : "availability": { { "buckets": [ "aggs": { { "availability": { "key": 0, "terms": { "doc_count": 2 "field": "copies" }, } { } "key": 1, } "doc_count": 1 } }, { "key": 6, "doc_count": 1 } ELASTICSEARCH Aggregations : The range aggregation Request : Response : { "years": { "aggs": { "buckets": { "years": { "*-1850.0": { "range": { "to": 1850, "field": "year", "doc_count": 0 "keyed": true, }, "ranges": [ "1851.0-1900.0": { { "to" : 1850 }, "from": 1851, { "from": 1851, "to": 1900 }, "to": 1900, { "from": 1901, "to": 1950 }, "doc_count": 1 { "from": 1951, "to": 2000 }, }, { "from": 2001 } "1901.0-1950.0": { ] "from": 1901, "to": 1950, ELASTICSEARCH Aggregations : Histogram aggregation Request : Response : { { "aggregations": { "aggs" : { "prices" : { "histogram" : { "buckets": { "0": { "field" : "price", "key": 0, "interval" : 50, "doc_count": 2 "keyed" : true } "prices": { }, "50": { } "key": 50, } } "doc_count": 4 }, "150": { "key": 150, "doc_count": 3 }…… ELASTICSEARCH Facet Request : } } { "facets":{ "department":{ "terms":{ "field":"department_name" } } }, "query":{ "constant_score":{ "boost":1.5, "filter":{ "term":{ "department_name":"Books" ELASTICSEARCH Use case : Faceting using Elasticsearch Aggregations ELASTICSEARCH Use case : Faceting using Elasticsearch Aggregations Facets Component - Facets help users to narrow down / or filter a search result, facet is built based on the search context. Sort Order - Sort order impacts the search results components, it defines in what order the results should be listed on the page, for instance a user may sort by lowest to highest price or by product ratings. Pagination of Results - Pagination component allows an user to navigate back and forth through a search results, this also guides the number of records that should be returned in ES query. Search Result - Restricted to number of records that should be displayed on the landing page, perhaps this will be configurable based on your application needs. ELASTICSEARCH Use case : Faceting using Elasticsearch Aggregations # Get search results and facets for men's category, filter by facet selection Brand = "diesel" $curl -XGET 'http://localhost:9200/products/_search?pretty=true' -d '{ "from" : 0, "size" : 5, "query": {"filtered": {"filter": {"term": {"Brand": "diesel"}}, "query": { "term" : { "categories" : "men" }}}}, "aggs" : { "offerprice" : { "range" : { "field" : "offerprice", "keyed" : true, "ranges" : [ { "to" : 5 }, { "from" : 5, "to" : 10}, { "from" : 10, "to" : 20}, { "from" : 20, "to" : 30} ] } }, "size" : {"terms" : {"field" : "size","order": { "_count" : "asc" }}}, "Deals" : {"terms" : {"field" : "offers","order": { "_count" : "asc" }}}, "Brand" : {"terms" : {"field" : "Brand","order": { "_count" : "desc" }}} }, "sort" : [{"offerprice" : {"order" : "asc", "mode" : "avg", "ignore_unmapped":true, "missing":"_last"}},"_score"] }' ELASTICSEARCH Use case : Faceting using Elasticsearch Aggregations # Get search results and facets for men's category, filter by facet selection Brand = "diesel" and Size "small" curl -XGET 'http://localhost:9200/products/_search?pretty=true' -d '{ "from" : 0, "size" : 5, "query": {"filtered": {"filter": { "and": [{"term": {"Brand":"diesel" }},{"term": {"size":"small"}}]}, "query": { "term" : { "categories" : "men" }}}}, "aggs" : { "offerprice" : { "range" : {"field" : "offerprice","keyed" : true,"ranges" : [{ "to" : 5 },{ "from" : 5, "to" : 10},{ "from" : 10, "to" : 20},{ "from" : 20, "to" : 30}]} }, "size" : {"terms" : {"field" : "size","order": { "_count" : "asc" }}}, "Deals" : {"terms" : {"field" : "offers","order": { "_count" : "asc" }}}, "Brand" : {"terms" : {"field" : "Brand","order": { "_count" : "desc" }}} }, "sort" : [{"offerprice" : {"order" : "asc", "mode" : "avg", "ignore_unmapped":true, "missing":"_last"}},"_score"] }' ELASTICSEARCH ES Client Python : https://github.com/elasticsearch/elasticsearch-dsl-py Ruby : https://github.com/printercu/elastics-rb Javascript : https://github.com/fullscale/elastic.js Scala : https://github.com/sksamuel/elastic4s Clojure : https://github.com/clojurewerkz/elastisch Nodejs : https://github.com/phillro/node-elasticsearch-client Spring-data-elasticsearch : https://github.com/spring-projects/spring-data-elasticsearch KIBANA Introduction https://github.com/elasticsearch/kibana Angular JS Responsive Design with Bootstrap Nodejs Platform Open-source KIBANA Dashboard KIBANA KIBANA Queries and filters KIBANA: Query : Filtering : KIBANA Panels : bettermap Bettermap panel : The field that contains the coordinates, in geojson format. GeoJSON is [longitude,latitude]in an array. This is different from most implementations, which use latitude, longitude. KIBANA Term panel A table, bar chart or pie chart based on the results of an Elasticsearch terms facet. KIBANA Histogram panel The histogram panel allow for the display of time charts. It includes several modes and tranformations to display event counts, mean, min, max and total of numeric fields, and derivatives of counter fields. KIBANA Map The map panel translates 2 letter country or state codes into shaded regions on a map. Currently available maps are world, usa and europe. KIBANA Table The table panel contains a sortable, pagable view of documents that. It can be arranged into defined columns and offers several interactions, such as performing adhoc terms aggregations. KIBANA Text The text panel is used for displaying static text formated as markdown, sanitized html or as plain text. KIBANA Trends A stock-ticker style representation of how queries are moving over time. For example, if the time is 1:10pm, your time picker was set to "Last 10m", and the "Time Ago" parameter was set to "1h", the panel would show how much the query results have changed since 12:00-12:10pm USE CASE 1 : MARKETING Analytics Dashboard DEMO MAKE SENSE OF YOUR DATA Example : Recommander Engine IN:{HTTP Requests} OUT:{Recommendations} Hybris Cluster Apache Mahout , Apache Spark mllib or H2O… Node 0 … Node N {Data analysis} ElasticSearch, Mongodb or Apache Solr {insert Data} Requests Collector MAKE SENSE OF YOUR DATA Example : Amazon MAKE SENSE OF YOUR DATA Example : Zalando MAKE SENSE OF YOUR DATA Example : NetFlix Q&A