Sandish Kumar H N Phone: +919008990742 Email: sanysandish@gmail.com Skype: sandishhadoop SUMMARY Successfully Implemented and delivered secure cloud solutions for major organizations such as Positive Bioscience and PointCross.com. More than 2 years relevant experience in Big Data Technologies and Cloud & SaaS Platform. KEY SKILLS Big Data Technologies: Hadoop.1.x, Hbase, Hive, MongoDB, Pig, MapReduce, Flume, Oziee, Cassandra, ZooKeeper, HDFS, Neo4j, Mahout. Cloud & SaaS Platforms: Strong Amazon Web Services, able to handle Windows Azure. Amazon Web Services: Amazon S3, Elastic MapReduce, EC2, Amazon, Java aws SDK , Dynamo DB. Programing Languages: Java, JavaScript, JSP, PHP, C, Shell scripting C++,OOPS, C#. Social Media API's: Twitter API, LinkedIn API, Facebook API. Web UI : Extjs, Java Script, html, jsp, Ajax, PHP, Jquery, Python (Basics). Other: RESTfull Web services, Servlets, Spring framework, git. Learning: Informatica, Pentaho, Sentiment Analysis, R, Yarn, MapReduce.2.X. PROFESSIONAL EXPERIENCE Positive Bioscience (Feb 2013-present) - Cloud Developer Design and development of software for Bioinformatics, Next Generation Sequencing (NGS) in Hadoop MapReduce framework, MangoDB using Amazon S3, Amazon EC2, Amazon Elastic MapReduce(EMR). Developed Hadoop MapReduce program to perform custom Quality Check on genomic data. Novel features of the program included capability to handle file-format/sequencing-machine errors, automatic detection of base-line PHRED score and being platform agnostic (Illumina, 454 Roche, Complete Genomics, ABI Solid input format data). Developed a Hadoop MapReduce program to perform sequence alignment on sequencing data. The MapReduce program implements algorithms such as Borrows-Wheeler Transform (BWT), Ferragina-Manzini Index (FMI), Smith-Waterman dynamic programming algorithm using Hadoop distributed cache. Configured and ran all MapReduce programs on 20-30 node cluster (Amazon EC2 spot instances) with Apache Hadoop-1.4.0 to handle 600GB/sample of NGS genomics data. Configured a 20-30 node (Amazon EC2 spot Instance) Hadoop cluster to transfer the data from Amazon S3 to HDFS and HDFS to Amazon S3 and also to direct input and output to the Hadoop MapReduce framework. Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for Input and Output. Developed java RESTfull webservices to upload data from local to Amazon S3, listing S3 objects and file manipulation operations. Developed MapReduce programs to perform Quality Check, Sequence Alignment, SNP calling, SV/CNV detection on single-end/paired-end NGS data Designed and transmitted a RDBMS(SQL) Database to NOSQL MangoDB Database. PointCross.com(A product based company) (Dec 2011-Feb 2013) Software Engineer Implemented Java Hbase MapReduce paradigm to load raw seismic and sensor data onto Hbase database on a 4 node Hadoop cluster. Developed RESTfull web service application to fetch values from Hbase to UI in XML and JSON Response format. Scripted separately in Hive and Pig to aggregate multiple drilling parameter values for downstream processing on a 4 node cluster. Programmed in Java MapReduce to port Indexed data from Hbase to Solr search engine Implemented cross-connection settings between Hive and Hbase. Installed and configured Hadoop 1.X, Hive, pig, Hbase on a 4 node local cluster. Customized OpenLayers maps to display KML information stored in Habse database to display oil well Information such as geo-location data. PROJECTS Whole Genome Sequencing Quality Check (QC): Entire program was developed in Hadoop MapReduce paradigm. Performed custom Quality Check on Genomic data to retain good quality reads. Implemented novel features in the software: Capability to handle sequencing machine errors. Capable of handling more than 600GB of raw NGS data. Automatic detection of baseline PHRED score. Being platform agnostic and able to handle various file formats i.e. from Illumina, 454Roche, Complete Genomics, ABI Solid input format data. Pig Script to Remove the bad quality scores in large files. Java REST Web Service to access Amazon S3 and Elastic MapReduce. Link: "http://www.positiveBioscience.com". Whole Genome Sequencing Sequence Alignment(Mapping): Developed a Hadoop MapReduce program to perform sequence alignment on NGS data. The MapReduce program implements algorithms such as BorrowsWheeler Transform (BWT), Ferragina Manzini Index (FMI), SmithWaterman dynamic programming algorithm using Hadoop distributed cache. Java program to check the output correlation with MongoDB. Link: "http://www.positivebioscience.com". DDSR (Drilling Data Search and Repository): This project aims to provide analytics for Oil and gas exploration data. This DDSR repository builded by using HBase, Hadoop and its sub projects. We are collecting thousands of wells data from across the globe. This data is stored in by using MapReduce jobs and stored in HBase and Hive data store. On top of this data we are building analytics for advanced analysis. Link: “http://www.pointcross.com/products/eNp/ddsr.html ”. Seismic Data Server & Repository(SDSR): Seismic Data Server & Repository solves the problem of delivering, on demand, precisely cropped SEGY files for instant loading at geophysical interpretation workstations anywhere in the network. Based on Hadoop file storage, Hbase™ and MapReduce technology, the Seismic Data Server brings faulttolerant petabyte scale store capability to the industry. Seismic Data Server supports poststack traces now with prestack support to be released shortly. Link: ”http://www.pointcross.com/products/eNp/seismicDataRepository.html”. Social Media Sentiment Analysis Update on each 4 hours (Positive Negative Review's): Social Media Sentiment Analysis project gives positive negative reviews on our favorite words search, it is based on Twitter tweets, Facebook status's etc... it has been implemented using Social media API's with BigData Technologies like Flume to get social media stream data regularly into HDFS and Cassandra, MongoDB to store sentiment analyzed data. and in this project i am using apache oozie to develop work flow and it will be ruing on each 4 hours. this is project is done with my self interest. Link: "Trying to get a cluster in Amazon Web services" EDUCATION: Bachelor of Engineering in Computer Science and Engineering, University of VTU Bangalore, Karnataka, India, 2011. Diploma in Computer Science and engineering University of KAE, Bangalore, Karnataka, India, 2008 REFERENCE LINKS: LinkedIn:in.linkedin.com/pub/sandish-kumar/2a/b91/23b/ Blogs : www.hadoopsandy.blogspot.com, https://www.sanysandish.jux.com Twitter: @sandishsany Skype: sandishhadoop