tech-compiler.com

advertisement
Committed to Deliver…





We are Leaders in Hadoop Ecosystem.
We support, maintain, monitor and provide services over
Hadoop whether you run apache Hadoop, Facebook
version or Cloudera version in your own data center, or
n cluster of machines Amazon EC2, Rackspace etc
We provide Scalable End-to-end Solution: Solution that
can scale of large data set (Tera Bytes or Peta Bytes)
Low Cost Solution: Based on open source Framework
currently used by Google, Yahoo and Facebook.
Solution optimized for minimum SLA and maximize
performance
– Project Initiation
• Project Planning
• Requirement Collection
• POC using Hadoop technology
– Team Building
• Highly skilled Hadoop experts
• Dedicated team for project
– Agile Methodology
• Small Iterations
• Easy to implement changing
requirement
– Support
• Long term relationship to support developed product
• Scope to change based on business/technical need
The combined experience has led to the adoption of
unique methodology that ensures quality work. We:
 Evaluating the hardware available and understand
the clients requirements.
 Peeking through the data.
 Analyzing data, prototype using M/R code. Show
the results to our clients.
 Iterative - and continuous improvement and develop
better understanding of data.
 Parallel development of various tasks:
◦ Data Collection
◦ Data Storage in HDFS
◦ M/R Analytics jobs.
◦ Scheduler to run M/R jobs and bring
coordination.
◦ Transform output into OLAP cubes (Dimension
and Fact Table)
◦ Provide a custom interface to retrieve the M/R
output




We are expert in time series data, in other
words we receive time-stamp data.
We have ample experience in writing efficient
fast and robust Map/Reduce code which
implement ETL functions.
We have massaged Hadoop to enterprise
standard provided features like High
Availability, Data Collection, data Merging.
Writing Map/Reduce is not enough. We wrote
layers on top of Hadoop which uses Hive, Pig
to transform data in OLAP cubes for easy UI
consumption.

We provide a brief about our clients.
Collector
Hadoop Cluster
Web
UI
Map / Reduce Output
UI
Display
Thrifit
Service
Training
Data
External News Collector
DFS Client
Map/ Reduce (Filtering, Term Freq Collection)
Map/ Reduce (Training Set)
Map/ Reduce Categorization Index
Hive Interface
Training
Data




We were asked to analyze their sales data and
extract valuable information from the data.
The Data was in form of 9-tuple format:
<OrderID, EmailID, MobileNum, ProductID,
PayableAmount, DeliveryCharges,
ModeofPayment, OrderStatus, OrderSite>
We were asked to provide information like unique
subscribers count (used email address), per day
transactions amount
We deployed the Hadoop cluster on three
machines
◦ Deployed our collector to pump data from DB into HDFS
◦ Wrote M/R jobs to generate OLAP cubes.
◦ Provided Hive Interface to extract and show in UI.
OrderID
EmailID
Mobile
Num
Mode of
Payment
Order
Status
Order
Site
Email ID
Day
Actual
Granularity Number of
Customers
Payable
Amount
Delivery
Charges
Payable
Amount
Forecast
Number of
Customers
Total
Aggregate
d Amount
Forecast
Aggregate
d Amount







We delivered end-to-end reporting solution to
Guavus.
The Data was provided by Sprint Network (Tier 1
Company) we had to develop a reporting engine
to analyze and generate OLAP cubes.
We were asked to provide evaluate Peta Bytes of
data provide ETL solution
We deployed the Hadoop cluster on 10 Linux
machines.
We wrote our collector which read Binary Data
and pushed into Hadoop Cluster.
We wrote M/R jobs (which run for 4 hrs) every day
The idea was to provide provide analytics on
stream data
We generate OLAP cubes and storing results in
Infinity DB (column DB), Hive.
Reporting UI/ Web Interface
Data Collector
Infinity DB / Hive / Pig
Query Engine
Hadoop
Configuration
Report Generation Task (Map / Reduce Framework)
Distributed Storage Framework (Hadoop / HDFS)
Monitor / Overall
Scheduler
Data
Collector
Data
Collector
Data
Collector
Hadoop
Infrastructure
Infinity DB
/ Hive / Pig
Rubix
Framework
UI Display
Data
Collector
Map / Reduce
Tasks



For HT we are developing a syndication
clustering algorithm.
We have large amount of old news document
and we were asked cluster. Manually
clustering was nearly impossible
We implement a clustering Map/Reduce
algorithm using Cosine Similarity and
clustered the documents.
XML files/
Documents
VN
List of XML
News Files
Transformed into
Integer Vector.
One XML news file
maps to One Vector.
MAP Functionality
Get the
Minimum
Distance
Pair of Vector
News
Files
Create
List of
closely
related
stories
REDUCE Functionality
HADOOP PLATFORM
News
Files
Categorize Documents
Apply
CoSine
Similarity
Between
Vectors
News
Files
C-Bayes Classification
V2
Cluster Algorithm
V1





Office Location:
India
A-82, Sector 57,
Noida, UP, 201301
Japan
2-8-6-405,Higashi Tabata
Kita-ku,Tokyo,Japan
General Inquiries
info@techcompiler.com
Sales Inquiries
sales@techcompiler.com
Download