tech-compiler.com

Committed to Deliver…      We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop, Facebook version or Cloudera version in your own data center, or n cluster of machines Amazon EC2, Rackspace etc We provide Scalable End-to-end Solution: Solution that can scale of large data set (Tera Bytes or Peta Bytes) Low Cost Solution: Based on open source Framework currently used by Google, Yahoo and Facebook. Solution optimized for minimum SLA and maximize performance – Project Initiation • Project Planning • Requirement Collection • POC using Hadoop technology – Team Building • Highly skilled Hadoop experts • Dedicated team for project – Agile Methodology • Small Iterations • Easy to implement changing requirement – Support • Long term relationship to support developed product • Scope to change based on business/technical need The combined experience has led to the adoption of unique methodology that ensures quality work. We:  Evaluating the hardware available and understand the clients requirements.  Peeking through the data.  Analyzing data, prototype using M/R code. Show the results to our clients.  Iterative - and continuous improvement and develop better understanding of data.  Parallel development of various tasks: ◦ Data Collection ◦ Data Storage in HDFS ◦ M/R Analytics jobs. ◦ Scheduler to run M/R jobs and bring coordination. ◦ Transform output into OLAP cubes (Dimension and Fact Table) ◦ Provide a custom interface to retrieve the M/R output     We are expert in time series data, in other words we receive time-stamp data. We have ample experience in writing efficient fast and robust Map/Reduce code which implement ETL functions. We have massaged Hadoop to enterprise standard provided features like High Availability, Data Collection, data Merging. Writing Map/Reduce is not enough. We wrote layers on top of Hadoop which uses Hive, Pig to transform data in OLAP cubes for easy UI consumption.  We provide a brief about our clients. Collector Hadoop Cluster Web UI Map / Reduce Output UI Display Thrifit Service Training Data External News Collector DFS Client Map/ Reduce (Filtering, Term Freq Collection) Map/ Reduce (Training Set) Map/ Reduce Categorization Index Hive Interface Training Data     We were asked to analyze their sales data and extract valuable information from the data. The Data was in form of 9-tuple format: <OrderID, EmailID, MobileNum, ProductID, PayableAmount, DeliveryCharges, ModeofPayment, OrderStatus, OrderSite> We were asked to provide information like unique subscribers count (used email address), per day transactions amount We deployed the Hadoop cluster on three machines ◦ Deployed our collector to pump data from DB into HDFS ◦ Wrote M/R jobs to generate OLAP cubes. ◦ Provided Hive Interface to extract and show in UI. OrderID EmailID Mobile Num Mode of Payment Order Status Order Site Email ID Day Actual Granularity Number of Customers Payable Amount Delivery Charges Payable Amount Forecast Number of Customers Total Aggregate d Amount Forecast Aggregate d Amount        We delivered end-to-end reporting solution to Guavus. The Data was provided by Sprint Network (Tier 1 Company) we had to develop a reporting engine to analyze and generate OLAP cubes. We were asked to provide evaluate Peta Bytes of data provide ETL solution We deployed the Hadoop cluster on 10 Linux machines. We wrote our collector which read Binary Data and pushed into Hadoop Cluster. We wrote M/R jobs (which run for 4 hrs) every day The idea was to provide provide analytics on stream data We generate OLAP cubes and storing results in Infinity DB (column DB), Hive. Reporting UI/ Web Interface Data Collector Infinity DB / Hive / Pig Query Engine Hadoop Configuration Report Generation Task (Map / Reduce Framework) Distributed Storage Framework (Hadoop / HDFS) Monitor / Overall Scheduler Data Collector Data Collector Data Collector Hadoop Infrastructure Infinity DB / Hive / Pig Rubix Framework UI Display Data Collector Map / Reduce Tasks    For HT we are developing a syndication clustering algorithm. We have large amount of old news document and we were asked cluster. Manually clustering was nearly impossible We implement a clustering Map/Reduce algorithm using Cosine Similarity and clustered the documents. XML files/ Documents VN List of XML News Files Transformed into Integer Vector. One XML news file maps to One Vector. MAP Functionality Get the Minimum Distance Pair of Vector News Files Create List of closely related stories REDUCE Functionality HADOOP PLATFORM News Files Categorize Documents Apply CoSine Similarity Between Vectors News Files C-Bayes Classification V2 Cluster Algorithm V1      Office Location: India A-82, Sector 57, Noida, UP, 201301 Japan 2-8-6-405,Higashi Tabata Kita-ku,Tokyo,Japan General Inquiries info@techcompiler.com Sales Inquiries sales@techcompiler.com

tech-compiler.com

Related documents

Products

Support

tech-compiler.com

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib