ResumeSandish

advertisement
Sandish Kumar H N
Phone: +919008990742
Email: [email protected]
Skype: sandishhadoop
SUMMARY
 Successfully Implemented and delivered secure cloud solutions for major organizations such as Positive
Bioscience and PointCross.com.
 More than 2 years relevant experience in Big Data Technologies and Cloud & SaaS Platform.
KEY SKILLS
 Big Data Technologies: Hadoop.1.x, Hbase, Hive, MongoDB, Pig, MapReduce, Flume, Oziee,
Cassandra, ZooKeeper, HDFS, Neo4j, Mahout.
 Cloud & SaaS Platforms: Strong Amazon Web Services, able to handle Windows Azure.
 Amazon Web Services: Amazon S3, Elastic MapReduce, EC2, Amazon, Java aws SDK , Dynamo DB.
 Programing Languages: Java, JavaScript, JSP, PHP, C, Shell scripting C++,OOPS, C#.
 Social Media API's: Twitter API, LinkedIn API, Facebook API.
 Web UI : Extjs, Java Script, html, jsp, Ajax, PHP, Jquery, Python (Basics).
 Other: RESTfull Web services, Servlets, Spring framework, git.
 Learning: Informatica, Pentaho, Sentiment Analysis, R, Yarn, MapReduce.2.X.
PROFESSIONAL EXPERIENCE
Positive Bioscience
(Feb 2013-present)
- Cloud Developer
 Design and development of software for Bioinformatics, Next Generation Sequencing (NGS) in Hadoop
MapReduce framework, MangoDB using Amazon S3, Amazon EC2, Amazon Elastic MapReduce(EMR).
 Developed Hadoop MapReduce program to perform custom Quality Check on genomic data. Novel features
of the program included capability to handle file-format/sequencing-machine errors, automatic detection of
base-line PHRED score and being platform agnostic (Illumina, 454 Roche, Complete Genomics, ABI Solid
input format data).
 Developed a Hadoop MapReduce program to perform sequence alignment on sequencing data. The MapReduce
program implements algorithms such as Borrows-Wheeler Transform (BWT), Ferragina-Manzini Index (FMI),
Smith-Waterman dynamic programming algorithm using Hadoop distributed cache.
 Configured and ran all MapReduce programs on 20-30 node cluster (Amazon EC2 spot instances) with Apache
Hadoop-1.4.0 to handle 600GB/sample of NGS genomics data.
 Configured a 20-30 node (Amazon EC2 spot Instance) Hadoop cluster to transfer the data from Amazon S3 to
HDFS and HDFS to Amazon S3 and also to direct input and output to the Hadoop MapReduce framework.
 Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using
Amazon S3 for Input and Output.
 Developed java RESTfull webservices to upload data from local to Amazon S3, listing S3 objects and file
manipulation operations.
 Developed MapReduce programs to perform Quality Check, Sequence Alignment, SNP calling, SV/CNV
detection on single-end/paired-end NGS data
 Designed and transmitted a RDBMS(SQL) Database to NOSQL MangoDB Database.
PointCross.com(A product based company)
(Dec 2011-Feb 2013)
Software Engineer
 Implemented Java Hbase MapReduce paradigm to load raw seismic and sensor data onto Hbase database
on a 4 node Hadoop cluster.
 Developed RESTfull web service application to fetch values from Hbase to UI in XML and JSON Response
format.
 Scripted separately in Hive and Pig to aggregate multiple drilling parameter values for downstream
processing on a 4 node cluster.
 Programmed in Java MapReduce to port Indexed data from Hbase to Solr search engine
 Implemented cross-connection settings between Hive and Hbase.


Installed and configured Hadoop 1.X, Hive, pig, Hbase on a 4 node local cluster.
Customized OpenLayers maps to display KML information stored in Habse database to display oil well
Information such as geo-location data.
PROJECTS
 Whole Genome Sequencing Quality Check (QC):
Entire program was developed in Hadoop MapReduce paradigm. Performed custom Quality Check on Genomic
data to retain good quality reads.
Implemented novel features in the software:
Capability to handle sequencing machine errors.
Capable of handling more than 600GB of raw NGS data.
Automatic detection of baseline PHRED score.
Being platform agnostic and able to handle various file formats i.e. from Illumina, 454Roche, Complete
Genomics, ABI Solid input format data.
Pig Script to Remove the bad quality scores in large files.
Java REST Web Service to access Amazon S3 and Elastic MapReduce.
Link: "http://www.positiveBioscience.com".
 Whole Genome Sequencing Sequence Alignment(Mapping):
Developed a Hadoop MapReduce program to perform sequence alignment on NGS data.
The MapReduce program implements algorithms such as BorrowsWheeler Transform (BWT),
Ferragina Manzini Index (FMI), SmithWaterman dynamic programming algorithm using Hadoop distributed cache.
Java program to check the output correlation with MongoDB.
Link: "http://www.positivebioscience.com".
 DDSR (Drilling Data Search and Repository):
This project aims to provide analytics for Oil and gas exploration data. This DDSR repository builded by using
HBase, Hadoop and its sub projects. We are collecting thousands of wells data from across the globe. This data is
stored in by using MapReduce jobs and stored in HBase and Hive data store. On top of this data we are building
analytics for advanced analysis.
Link: “http://www.pointcross.com/products/eNp/ddsr.html ”.
 Seismic Data Server & Repository(SDSR):
Seismic Data Server & Repository solves the problem of delivering, on demand, precisely
cropped SEGY files for instant loading at geophysical interpretation workstations anywhere in the network. Based
on Hadoop file storage, Hbase™ and MapReduce technology, the Seismic Data Server brings faulttolerant
petabyte scale store capability to the industry. Seismic Data Server supports poststack
traces now with prestack support to be released shortly.
Link: ”http://www.pointcross.com/products/eNp/seismicDataRepository.html”.
 Social Media Sentiment Analysis Update on each 4 hours (Positive Negative Review's):
Social Media Sentiment Analysis project gives positive negative reviews on our favorite words search, it is based
on Twitter tweets, Facebook status's etc... it has been implemented using Social media API's with BigData
Technologies like Flume to get social media stream data regularly into HDFS and Cassandra, MongoDB to store
sentiment analyzed data. and in this project i am using apache oozie to develop work flow and it will be ruing on
each 4 hours. this is project is done with my self interest.
Link: "Trying to get a cluster in Amazon Web services"
EDUCATION:
 Bachelor of Engineering in Computer Science and Engineering, University of VTU
Bangalore, Karnataka, India, 2011.
 Diploma in Computer Science and engineering University of KAE,
Bangalore, Karnataka, India, 2008
REFERENCE LINKS:
 LinkedIn:in.linkedin.com/pub/sandish-kumar/2a/b91/23b/
 Blogs : www.hadoopsandy.blogspot.com,
https://www.sanysandish.jux.com
 Twitter: @sandishsany
 Skype: sandishhadoop
Download