MineFleet®*: An Overview of a Widely Adopted Distributed Vehicle Performance Data Mining System Hillol Kargupta** Kakali Sarkar Michael Gilligan Agnik, LLC 8840 Stanford Blvd. Columbia, MD 21045, USA Agnik, LLC 8840 Stanford Blvd. Columbia, MD 21045, USA Agnik, LLC 8840 Stanford Blvd. Columbia, MD 21045, USA hillol@agnik.com kakali@agnik.com mgilligan@agnik.com possibility. Several years of research on distributed data mining [1, 2, 3, 5, 7] and data stream mining have produced a reasonably powerful collection of algorithms and system-architectures that can be used for developing several interesting classes of distributed applications for lightweight wireless applications. In fact an increasing number of such systems [4, 6] are being reported in the literature. Some commercial systems are also starting to appear. ABSTRACT This paper describes the MineFleet® distributed vehicle performance data mining system designed for commercial fleets. MineFleet analyzes high throughput data streams onboard the vehicle, generates the analytics, sends those to the remote server over the wide-area wireless networks and offers them to the fleet managers using stand-alone and web-based user-interface. The paper describes the overall architecture of the system, business needs, and shares experience from successful large-scale commercial deployments. MineFleet is probably one of the first commercially successful distributed data stream mining systems. This patented technology has been adopted, productized, and commercially offered by many large companies in the mobile resource management and GPS fleet tracking industry. This paper offers an overview of the system and offers a detailed analysis of what made it work. This paper reports the development of MineFleet®, a novel mobile and distributed data mining application for monitoring vehicle data streams in real-time. MineFleet is designed for monitoring commercial vehicle fleets using onboard embedded data stream mining systems and other remote modules connected through wireless networks in a distributed environment. MineFleet is a powerful data stream mining software for modeling, benchmarking, and monitoring of vehicle health, emissions, driver behavior, fuel-consumption, and fleet characteristics. Categories and Subject Descriptors H.1.0 [Models and Pinciples]: General; H.4.m [Information Systems Applications]: Miscellaneous Consider a nationwide grocery delivery system which operates a large fleet of trucks. Regular maintenance of the vehicles in such fleets is an important part of the supply chain management and normally commercial fleet management companies get the responsibility of maintaining the fleet. Fleet maintenance companies usually spend a good deal of time and labor in collecting vehicle performance data, studying the data offline, and estimating the condition of the vehicle primarily through manual efforts. Fleet management companies are also usually interested in studying the driving characteristics for a variety of reasons (e.g. policy enforcement, insurance, Department of Transportation regulations). Monitoring fuel consumption, vehicle emissions, and identifying how vehicle parameters can be optimized to get better fuel economy are some additional reasons that support ample return of investment (ROI) for systems like MineFleet. General Terms Algorithms, Experimentation, Design, Performance. Keywords Vehicle data stream mining, distributed data mining, telematics. 1. INTRODUCTION The wireless and mobile computing/communication industry is producing a growing variety of devices that process different types of data using limited computing and storage resources with varying levels of connectivity through wireless communication networks. The rich source of data from the ubiquitous components of businesses, mechanical devices, and our daily lives offers the exciting possibility of a new generation of data intensive applications for distributed and mobile environments. Mining distributed data streams in a ubiquitous environment is one such The MineFleet is widely adopted in the mobile resource management and fleet management industry. Similar applications also arise in monitoring the health of airplanes and space vehicles [9, 10, 11]. There is a strong need for real-time on-board monitoring and mining of data (e.g. flight systems performance data, weather data, radar data about other planes). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’10, July 25–28, 2010, Washington, DC, USA. Copyright 2010 ACM 978-1-4503-0055-1/10/07...$10.00. ------------------------------------------------------*Protected by Patented Technology; ** Also affiliated with the CSEE department, University of Maryland, Baltimore County. The main unique characteristics of the MineFleet system that distinguish it from traditional data mining systems are as follows: 37 transmission of the results to the server over the wireless network. This problem is even more serious for long distance trucks and off-the road equipments that make use of satellite-based wireless communication networks instead of the land-based cellular networks. Note that the satellite-based wireless communication networks are considerably more expensive compared to their land-based counterparts. 1. Distributed mining of the multiple mobile data sources with little centralization of the data. 2. Onboard data stream management and mining using embedded computing devices. 3. Designed to pay careful attention to the following important resource constraints: a. Minimize data communication over the widearea wireless network. b. Minimize onboard data storage and the footprint of the data stream mining software. 4. Process high throughput data streams using resourceconstrained embedded computing environments. 5. Respect privacy constraints of the data, whenever necessary. Section 2 presents the business motivation. Section 3 compares MineFleet with existing vehicle telematics systems. Section 4 offers an overview of the MineFleet system architecture. Section 5 describes the Predictive Health Monitoring capabilities of the MineFleet system. Section 6 discusses the Fuel Consumption Analysis module. Section 7 offers an overview of the Driver Behavior Monitoring module. Section 8 discusses the Emissions Monitoring capabilities of the MineFleet system. Section 9 discusses how MineFleet penetrated the market and achieved wide-spread adoption. It also offers a perspective of how the business ecosystem evolved and shares some of the experiences in placing a new distributed data mining product in a new vertical not quite familiar with the data mining technology. Finally, Section 10 concludes the paper. Commercial fleets are usually comprised of large number of vehicles. These fleets are usually segmented among a set of different groups of vehicles of same type. Fleets also have drivers and the overall efficiency of the fleet depends on the driver behavior. Therefore, just analysis of the vehicle performance data onboard the vehicle is not enough. Comparing and contrasting the performances of different vehicles and drivers is also very important. Moreover, the embedded devices placed onboard the vehicles are often inexpensive and resource-constrained. For example, typical GPS tracking devices that are deployed in large commercial fleets would be based on 8-bit microcontrollers or 32-bit processors with limited storage. As a result, the onboard devices cannot be used for long term storage of the analytics. Results of the relatively short-term analysis should be sent and aggregated at the server for long term modeling, trend-analysis, and outlier detection. MineFleet’s business case relies upon these observations. It is based on the distributed data mining technology that is driven by the following capabilities: 2. BUSINESS MOTIVATION The success of MineFleet is fundamentally based on a strong business case. There are approximately 25 million commercial vehicles in North America alone and about 250 million passenger vehicles in US only. So the total market size is fairly substantial. Vehicles generate ample data. However, accessing the data, particularly the ones specific to different manufacturers, is a nontrivial issue from the know-how perspective since many of these parameters are not publicly available. Upon access to the data, there are many useful things that can be done (e.g. detect potential health problems, optimize fuel economy, achieve proper driving performance, and emissions reductions) by using advanced data mining techniques. This also implies that any investment to achieve extensive access to the vehicle performance data from different manufacturers is likely to create natural protection by limiting quick-market entry by other competitive entities. 1. Access to a large number of vehicle subsystem parameters that are not publicly available. 2. Onboard analysis of the vehicle data streams using advanced data stream mining algorithms capable of supporting high throughput streams (e.g. one tuple of data every 10 ms). 3. Aggregation and comparative analytics of the vehicles connected over a distributed bandwidth-constrained wireless network. The following section discusses related work. 3. RELATED BUSINESS PRACTICES MineFleet is a real-time distributed vehicle performance monitoring system. To the best of our knowledge this is the first distributed data mining system for commercial fleet monitoring. However, it builds on the existing work on vehicle telematics. Existing vehicle telematics systems collect vehicle performance data and offer them to the fleet managers or vehicle owners. OnStar1 for General Motors vehicles and Sync2 from Ford are examples of such telematics systems. There are some major differences between the MineFleet and traditional telematics systems. Some of them are listed below: There are many other challenges. First of all, vehicles generate high throughput data streams. Monitoring hundreds of different sub-system parameters over couple of hours may easily generate several mega bytes of data and transmitting this data over the wireless network is a non-trivial challenge. One of the main reasons is that most fleet owners do not appear to be willing to pay for a wireless data plan more than 5MB per month or so. Moreover, most of the fleets that opt for advanced vehicle performance data mining capabilities also require tracking and navigation related capabilities. As a result the data transmission is further constrained. This requires exploring the other option--onboard analysis of the vehicle performance data and 1. 38 Advanced data analytics: MineFleet is powered by advanced distributed data mining and statistical analysis algorithms. Most telematics systems are designed for in- 1 www.onstar.com 2 http://www.fordvehicles.com/technology/sync/ moving vehicle using an on-board computing device, identifies the emerging patterns, and if necessary reports these patterns to a remote control center over low-bandwidth wireless network connection. car infotainment and security application based on relatively simple data management operations. 2. 3. Onboard data mining: MineFleet offers dramatic reduction of wireless communication by performing data analysis onboard the vehicle. Unlike most conventional telematics systems, MineFleet sends the results of the onboard analysis to the server over the wireless network, not the raw data. As mentioned earlier, if a device is monitoring hundreds of vehicle performance parameters it may easily collect about 10 MBs of raw data in about a few hours. Sending this raw data to the server for advanced data mining at the server over the wireless network is very expensive. Most MineFleet customers would not pay for a data plan beyond 5MB per month. Therefore, analyzing data onboard the vehicle and sending the resulting analytics instead of raw data is imperative. One full MineFleet update takes about 1K. If a vehicle runs for about 8hrs a day and gets an update once a hour then in 30 days the vehicle would need about 240K wireless data communication in order to send the MineFleet analytics to the server. This dramatic reduction in communication cost is a unique feature of the MineFleet technology which enabled more powerful data analysis and mining at a low cost. MineFleet also offers different distributed data mining capabilities for detecting fleet-level patterns across the different vehicles in the fleet. This section presents a brief overview of the architecture of the system and the functionalities of its different modules. The current implementation of MineFleet analyzes and monitors only the data generated by the vehicle's on-board diagnostic system and sometimes the Global Positioning System (GPS). MineFleet Onboard is designed for embedded in-vehicle computing devices, tablet PCs, and cell-phones. The overall conceptual process diagram of the system is shown in Figure 1. The MineFleet system is comprised of several important components that are briefly described in the following sections. 4.1 Onboard Hardware MineFleet Onboard module is comprised of the computing device that hosts the software to analyze the vehicle-performance data and the interface that connects the computing device with the vehicle data bus. Figure 2 shows the MineFleet Onboard Data Mining platform (MF-DMP101) device that hosts the MineFleet Onboard software. MineFleet also runs on many different types of embedded devices, in-vehicle-tablet-PCs, laptops, cell-phones and other types of handheld devices. Several other hardware platforms (e.g. DMP-201 from Agnik and other third-party vendors) are also currently available for running MineFleet Onboard. Not a GPS-based tracking/navigation system: Unlike most conventional telematic devices MineFleet is primarily focused on vehicle performance data analysis not tracking and navigation. These unique aspects of the MineFleet distinguish itself from the conventional tracking/navigation and telematic services. The following section offers an overview of the MineFleet architecture. 4. MINEFLEET: AN OVERVIEW Figure 2. MineFleet Data Mining Platform (MF-DMP101) that hosts the MineFleet Onboard software. © Copyright, Agnik, LLC. 4.2 Onboard Data Stream Mining Module Figure 1. MineFleet architecture. © Copyright, Agnik, LLC. This module manages the incoming data streams from the vehicle, analyzes the data using various statistical and data stream mining algorithms, and manages the transmission of the resulting analytics to the remote server. This module also triggers actions whenever unusual activities are observed. It connects to the MineFleet Server located at a data center through a wireless network. The system allows the fleet managers to monitor and analyze vehicle performance, driver behavior, emissions quality, MineFleet® is a mobile and distributed data stream mining environment where the resource-constrained "small" computing devices need to perform various non-trivial data management and mining tasks on-board a vehicle in real-time. MineFleet analyzes the data produced by the various sensors present in most modern vehicles. It continuously monitors data streams generated by a 39 and fuel consumption characteristics remotely without necessarily downloading all the data to the remote central monitoring station over the expensive wireless connection. 4.6 Return of Investment MineFleet offers ROI on many different fronts. For example, the driver behavior monitoring analytics offer direct ROI by reducing idling resulting in reduced emission and fuel consumption and reducing hard braking resulting in less frequent brake shoe replacement. Wireless emissions monitoring eliminates the need to send the vehicle to the Smog test center saving around $200 per vehicle. Fuel consumption analysis improves gas mileage by identifying sub-optimal conditions of vehicle systems such as O2 sensor. Based on the data from many fleets that have been running MineFleet, several case studies have been generated. It appears that MineFleet offers at least about 4-5% reduction in the fleet monthly operating costs. This is a significant ROI for the commercial fleet monitoring and mobile workforce management vertical. Detailed ROI analysis and ROI calculators are also available. 4.3 MineFleet Server 4.7 Algorithmic Challenges In order to monitor the vehicle data streams using the on-board data management and mining module we need continuous computation of several statistics. For example, the MineFleet Onboard system has a module that continuously monitors the Figure 3. MineFleet Server. © Copyright, Agnik, LLC. The MineFleet Server is in charge of receiving all the analytics from different vehicles, managing those analytics, and further processing them as appropriate. The MineFleet Server supports the following main operations: (i) interacting with the on-board module for remote management, monitoring, and mining of vehicle data streams and (ii) managing interaction with the MineFleet Web Services. It also offers a whole range of fleetmanagement related services that are not directly related to the main focus of this paper. The Server is connected with a relational database management system where it stores the analytics received from the vehicles in the fleet. All the onboard diagnostic, provisioning, and updates are performed over-the-air. Using an easy-to-use web-based interface, members of the support team from Agnik and its resellers perform these over-the-air operations. 4.4 MineFleet Web Services This module offers a web-browser-based interface for the MineFleet analytics. It also offers a rich class of API functions for accessing the MineFleet analytics which in turn can be integrated with third-party applications. Figure 4 shows one of the interfaces of the MineFleet Web Services. MineFleet is currently offered by many vendors that have already integrated their web-based mobile resource management product with the MineFleet webservices. Figure 4. User interface of the MineFleet Web Services. ©Copyright, Agnik, LLC. spectral signature of the data which requires computation of covariance and correlation matrices on a regular basis. The onboard driving behavior characterization module requires frequent computation of similarity/distance matrices for data clustering and monitoring the operating regimes. Since the data are usually high dimensional, computation of the correlation matrices or distance (e.g. inner product, Euclidean) matrices is difficult to perform using their conventional algorithmic implementations. 4.5 Privacy Management Module This module plays an important role in the implementation of the privacy policies. This module manages the specific policies regarding what can be monitored and what cannot be. It also allows the Fleet manager to create an environment where the MineFleet technology can be used for saving money, sharing benefits without violating the privacy of the drivers. The incoming data sampling rate supported by the vehicle data bus limits the amount of time we get for processing the observed data. This usually means that we have only a few seconds to quickly analyze the data using the on-board-hardware (e.g. the MF-DMP101 device). If our algorithms take more time than what we have in hand, we cannot catch up with the incoming data rate. 40 collecting diagnostic trouble codes, malfunction indicator lightdata, and analyzing a large number of parameters available through the diagnostic data port. In order to handle this situation, we need to address the following issues: 1. We need fast "light-weight" techniques for computing and monitoring the correlation, covariance, inner product, and distance matrices that are frequently used in data stream mining applications. 2. Typically, sensors in the vehicle subsystems generate two types of data. The observed operation conditions that are relatively independent variables and the dependent features that change behavior in response to the changes in the operating condition variables. Examples of operating condition variables in conventional automobiles include the following: Barometric Pressure, Calculated Engine Load(%), Engine Coolant Temperature (°F), Engine Speed (RPM), Engine Torque, Intake Air Temperature (IAT) (°F), Mass Air Flow Sensor 1(MAF) (lbs/min), Start Up Engine Coolant Temp. (°F), Start Up Intake Air Temperature (°F), Throttle Position Sensor (%) , Throttle Position Sensor (degree), Vehicle Speed (Miles/Hour), and Odometer (Miles). We need algorithms that will do something useful when the running time is constrained. In other words, we allow the data mining algorithm to run for a fixed amount of time and expect it to return some meaningful information. For example, we give the correlation matrix computation algorithm certain number of CPU cycles for identifying the coefficients with magnitude greater than 0.7. If that time is not sufficient for computing all the correlation coefficients in the matrix then the algorithm should at least identify the portions of the matrix that may contain significant coefficients. There are also many other features that depend on the operating conditions. Examples from the fuel sub-system include Air Fuel Ratio, Fuel Level Sensor (%), Fuel System Status Bank 1 [Categ. Attrib.], Oxygen Sensor Bank 1 Sensor 1 [mV], Oxygen Sensor Bank 1 Sensor 2 (mV), Oxygen Sensor Bank 2 Sensor 1 (mV), Oxygen Sensor Bank 2 Sensor 2 (mV), Long Term Fuel Trim Bank 1 (%), Short Term Fuel Trim Bank 1(%) , Idle Air Control Motor Position, Injector Pulse Width #1 (msec), and Manifold Absolute Pressure (Hg). In order to illustrate the idea, consider the problem of monitoring the correlation matrices little more closely. Given an m x n data matrix U with m observations and n features, the correlation matrix is computed by UTU assuming that the columns of U are normalized to have zero mean and unit variance. A straight forward approach to compute the correlation matrix using matrix multiplication takes O(m.n2) multiplications which is computationally very expensive. MineFleet deploys fast probabilistic algorithms to detect changes in the correlation matrices that are based on the observation that the sum of squared values of the elements in the correlation matrix that are above the diagonal, C=∑1≤j1≤j2≤nCorr2(j1,j2) where Corr(j1,j2)=∑ ui,j1 ui,j2 represents the correlation coefficient between j1--th and j2--th columns of the data matrix U. Using this observation, one can design a divide and conquer algorithm for searching in the space of correlation coefficients for detecting significantly changed correleation coefficients. More discussion on some of these algorithms can be found elsewhere [5]. Since operating conditions for a complex vehicle can be diverse, segmenting the distribution of values can be effective. Once the data is segmented into different regimes, models for each one of the regimes should be developed for the different regimes. MineFleet is powered by many such advanced stream mining algorithms designed to run in a resource-constrained environment. MineFleet makes use of distributed data mining algorithms that reply upon such advanced onboard data analysis techniques and aggregation of the resulting analytics at the server. This paper does not discuss the algorithmic issues. Rather it focuses on the functional capabilities and business case analysis. The following section describes one of the key capabilities of MineFleet--Predictive Health Monitoring. 5. PREDICTIVE HEALTH MONITORING Figure 5. Example of a vehicle health test designed based on the domain knowledge and statistical analysis of data. © Copyright, Agnik, LLC. This section provides insight into the vehicle health monitoring module of the MineFleet system. Predictive vehicle health monitoring is very important in many fleets since breakdown of a vehicle on the road is often very expensive because of the downtime, unpredictability, and often increase in cost. MineFleet performs a large number of health tests onboard the vehicle and if any of the tests fail MineFleet would report that to the server along with its recommended severity level. Figure 5 shows one such example. Predictive health monitoring in cars usually involves processing multitude of information available from the diagnostic data bus and possibly correlating that with maintenance data. This includes 41 MineFleet also assigns a health score to each vehicle by aggregating the results of the health tests performed over a certain period of time. Figure 6 shows the interface for identifying the vehicles with poor health score using a heat-map like interface. Red zones represent vehicles with poor health scores. User can easily click on those regions in order to dig up more information about those troubling vehicles. A tabular view of the vehiclehealth scores is also available in MineFleet. 6. FUEL CONSUMPTION ANALYSIS The MineFleet fuel consumption analysis module offers many unique capabilities to compute the fuel economy of a vehicle/fleet, perform trend analysis of various kinds, and correlate that with various vehicle and driver performance parameters. Figure 8. Variation of Mass Air Flow with respect to Engine Speed and Engine Load. Figure 6. The vehicle health score visualization interface in MineFleet. The color coded heatmap interface allows the fleet manager to quickly identify the vehicles with poor health score and drill down to find out the reason behind it. © Copyright, Agnik, LLC. Typical vehicle fuel subsystems are high dimensional and modeling the data onboard the vehicle requires feature selection based on domain knowledge and representation construction using various techniques such an eigen analysis and other orthogonal transformations. For example, consider Figure 8 which shows the variation of mass air flow with respect to engine speed and engine load. The relationship is fairly non-linear. A comprehensive analysis of the fuel subsystem would typically require including many additional parameters. As a result, near orthogonal transformations similarity-preserving transformations are often very useful. Figure 9 shows the transformation of the data in Figure 8 in the eigenspace. Figure 10 shows an example of how MineFleet offers an ROI by linking vehicle health condition with performance parameters such as fuel economy. The figure shows the user interface that shows the top five ways to improve the fuel economy that are identified by various onboard data mining techniques. It also offers fuel savings calculator driven by predictive models learnt from the data collected from that vehicle. The resulting ROI is direct, simple to understand, and execute. This allows the fleet manager to decide what to do when a particular vehicle health condition arises. Using MineFleet the fleet manager can quantify Figure 7. Predictive vehicle maintenance data analysis module. © Copyright, Agnik, LLC. Figure 7 shows the interface for a module that analyzes the vehicle maintenance data and links that with the vehicle diagnostic data. The goal is to detect unusual patterns in the vehicle maintenance operations and identify their reasons. 42 Figure 9. Modeling through advanced engine analysis. © Copyright, Agnik, LLC. Figure 11. Predictive fuel consumption analysis module of MineFleet. © Copyright, Agnik, LLC. the main capabilities of the MineFleet driver behavior monitoring module with short-term return on investment are listed below: 1) Identify the speeding, braking, idling characteristics of the driver and use that for driver retraining policy execution. 2) Assign performance measures to the drivers based on various characteristics and identify outlier drivers. 3) Identify unusual maintenance operations caused by suboptimal driver performance. Figure 12 shows the MineFleet interface for quantifying the effect of various driving characteristics on fuel economy. For example, the fuel savings calculator shows the effect of idling on fuel economy and quantifies the saving. The following section discusses the emissions monitoring capabilities of MineFleet. Figure 10. Correlation of vehicle health events with fuel economy. The fuel savings calculator quantifies the effect on fuel economy. © Copyright, Agnik, LLC. how much money the organization is likely to save by fixing the health condition. Similar analysis is also performed for driver behavior which will be discussed in the following section. Figure 11 shows the MineFleet fuel subsystem benchmarking module where the distributions of a vehicle can be compared with those of other vehicles. The module can also be used to optimize the fuel economy by changing the policy parameters prior to designing a policy. For example, one may vary the speeding policy and find out the optimal fuel economy based on the predictive models learnt from that vehicle. 7. DRIVER BEHAVIOR MONITORING MineFleet allows the fleet owner to monitor both the short-term and long-term behaviors of the drivers in a fleet. MineFleet Onboard monitors the driving related data characterized by speed, acceleration, braking, idling and several other parameters. It also correlates the information with vehicle performance parameters (e.g. fuel economy) and fleet maintenance parameters. Some of Figure 12. Correlation driver behavior with fuel economy. © Copyright, Agnik, LLC. 43 bigger picture correlating emission data with data collected from the different facets of fleet operations. 8. EMISSIONS MONITORING Greenhouse gas (GHG) emissions that contribute to climate change are a global problem. Although future concentrations, damages and costs are unknown, it is widely recognized that major emissions reduction efforts are needed. Of the four primary GHG under scrutiny, carbon dioxide (CO2), and the need to lower carbon emissions in general, is of paramount concern. It is estimated that transportation activities are responsible for approximately 25% to 30% of total U.S. GHG emissions, with the on-highway commercial truck market accounting for over 45% of transportation GHG. However, the transportation sector emissions remain almost entirely unaddressed with respect to GHG and CO2 reduction. The Intergovernmental Panel on Climate Change (IPCC) provided guidelines for calculating carbon emission offer estimations only for certain common types of fuels; even the estimates are not available novel fuel blends and gaseous fuels such as CNG and LNG. Indeed, these and other references have documented the uncertainty in model-based theoretical carbon emissions calculations3 and the need for a standardized, consistent method of accurately characterizing CO2 emissions. Moreover, correlating various vehicle performance and traffic parameters may open up new insights resulting in better techniques for controlling emissions. For example, it is widely known that vehicle speed, engine load and state of repair/maintenance play important roles in governing emissions. Mining the emissions data along with the traffic patterns in a metropolitan area, vehicle performance (load, rpm, and vehicle oxygen sensor characteristics) and the driving behavior may provide useful information to design speed limits, traffic signals and fleet maintenance policies. Such advanced analysis of emission data will be possible only when we can directly and accurately measure emissions in the vehicle. Figure 13. Emissions monitoring web-page in MineFleet. © Copyright, Agnik, LLC. The emissions offset trading market and the demand for cleaner transportation systems is driving several market incentives. Figure 14 shows the web portal of one such carbon offset trading company. The MineFleet technology offers a verifiable methodology to quantify the greenhouse and air-pollution emissions in a vehicle in real-time. As a result, this allows accurate computation of the carbon offsets and reductions in a commercial fleet which lays the foundation of the business of carbon trading. MineFleet offers some of these possibilities. For example, MineFleet can be used for wireless emissions test. It can measure the emissions data in real-time, correlate that with the vehicle performance and traffic data using advanced statistical and machine learning-based techniques such as clustering, predictive modeling, correlation analysis and eigen analysis. These analytics can be used to offer a new generation of decision support tools to develop fleet and greenhouse gas emissions management policies. MineFleet computes emissions in real-time onboard the vehicle. It also performs various other tests such as the wireless emissions test required by motor vehicle administrations. Figure 13 shows the emissions monitoring web-page of MineFleet Web Service. Vehicle emission characteristics depend on different vehicle and driver-related parameters. Vehicle health is often a function of the type of the vehicle, maintenance policies and operating policies (e.g. delivery schedule of supply truck). Driver behavior is also correlated with traffic condition and driver training programs in a commercial fleet. Therefore, the next generation of decisionsupport tools for emissions management will have to look at the 3 Figure 14. Web portal of a carbon trading company. 9. MINEFLEET IN BUSINESS: ENGINEERING COMPLEX ECO-SYSTEM EPA OTAQ Publication, no. EPA 420-F-05-001, Average Carbon Dioxide Emissions Resulting from Gasoline and Diesel Fuel,” February, 2005, notes the following: “These calculations and the supporting data have associated variation and uncertainty. EPA may use other values in certain circumstances, and in some cases it may be appropriate to use a range of values.” The basic tenets of the value proposition in any business often depend upon the following NABC cornerstones: 1. 44 N: What is the customer/market need? 2. A: What is our specific approach to satisfy that need? 3. B: What are the benefits that the customers and their affiliates will get from the approach? 4. C: What is the competition or alternative to the approach? companies. MineFleet addressed this problem by going to market only through its resellers and channel partners. Alliance with fairly large companies with large marketing infrastructure helped gaining market share. MineFleet product design choices also highly influenced the evolution of the business eco-system and its sustainability. Figure 15 shows the conceptual depiction of the MineFleet product in 2003. It had a PDA, Bluetooth GPS module and the vehicle diagnostic port adapter. This conceptual model evolved a lot over time in order to support a sustainable relationship with MineFleet go-to-market channel partners and resellers. For example, the PDA-based approach was not adopted because of the high cost issue. On the other hand, the Bluetooth GPS module was dropped mainly to build a relationship with many other vendors that offer a GPS tracking solution. Note that the NABC (Need, Approach, Benefit, and Approach) tenet depends upon the behavior of the customers, their affiliates, the business offering the products/services, and the competition (underscored terms in the itemized list). This essentially means that the value proposition of any product or service depends on the collective behavior of the entire business eco-system comprised of the provider, consumer, competition and others. Moreover, Agnik as an early stage company focused not just on value creation rather sustainable value creation where the business relationships among Agnik and its go-to-market partners for the MineFleet product would be able to sustain the challenges faced by many early stage technology companies. The MineFleet system penetrated the market by evolving rules of engagement that aided sustainable relationship among the different players of the Mobile Resource Management vertical. Initial MineFleet product placement faced several challenges. Some of those are listed below: 1. Quantification of ROI. 2. Lack of familiarity with the data mining technology in the target vertical. 3. Lack of large marketing infrastructure. 4. Lack of adequate support infrastructure. Figure 15. Early conceptualization of the MineFleet Onboard system. © Copyright, Agnik, LLC. Each of these topics is discussed further below. The above go-to-market approach also helped the support scenario. The need for large support infrastructure was avoided by training the support team of the go-to-market channel partners and resellers. This alleviated the load on the Agnik team to develop an extensive on-the-ground installation and support team for MineFleet. Today, the MineFleet system offers many market-tested features that offer direct short-term ROI and enough case studies exist to back up the claims. However, this was not the case when MineFleet was initially introduced to a select group of potential clients in the early stage. Active collaboration between MineFleet team and other organizations that were willing to explore the technology resulted in the development of many useful features in MineFleet with immediate ROI. MineFleet is widely adopted by many companies in the Machineto-Machine and GPS tracking verticals. For samples names of such clients please visit the Agnik web-site. MineFleet is already integrated with several large vehicle-onboard hardware manufacturers. MineFleet-powered third-party solutions are currently being deployed through many of Agnik’s channel partners each with more than hundred thousands of vehicles in their respective rosters. Example of some of those clients are listed at Agnik website. A detailed report4 analyzing MineFleet’s technical and business approaches is available from Frost & Sullivan. A copy of the detailed report is available upon request. MineFleet is available in the software-as-a-service model. The following section concludes this paper. The initial experimental versions of the MineFleet system was full of many features that required advanced knowledge of data analysis and modeling techniques. The interface looked like the traditional data mining systems that commercially available. This approach did not work. Advanced visualization and analytic tools often had to be either replaced or backed up by simple text-based actionable intelligence. One the main reason was that the typical fleet management executives are usually not very familiar with the statistics and data mining technology. The user interface had to be non-threatening and relatively easy to understand. Once the vertical became familiar with the role of data mining technology to some extent, advanced analysis and visualization techniques could be introduced. 4 Another major challenge was the lack of large marketing infrastructure, which is probably common for many early stage 45 http://finance.yahoo.com/news/Agnik-Enhances-Mobileprnews-3142016515.html?x=0&.v=1 Distributed Data Mining. Advances in Distributed and Parallel Knowledge Discovery, Eds: Hillol Kargupta and Philip Chan. MIT/AAAI Press. 10. CONCLUSIONS This paper offered an overview of the MineFleet system and the business case behind it. It described the architecture, main functionalities, and how these features are useful in solving the everyday problems in commercial fleet management. The paper also shared some of the experiences in placing a new distributed data mining technology-based product in a vertical that was not very familiar with advanced decision support systems. The paper identified some of the engagement rules that evolved during the course of time resulting in successful partnership between the existing products from the mobile resource management companies and the MineFleet.system. [4] H. Kargupta, R. Bhargava, K. Liu, M. Powers, P. Blair, S. Bushra, J. Dull, K. Sarkar, M. Klein, M. Vasa, and D. Handy. (2004). VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring. Proceedings of the SIAM International Data Mining Conference, Orlando. [5] H. Kargupta, V. Puttagunta, M. Klein, K. Sarkar (2006). Onboard Vehicle Data Stream Monitoring using MineFleet and Fast Resource Constrained Monitoring of Correlation Matrices. Next Generation Computing. Invited submission for special issue on learning from data streams, volume 25, no. 1, pp. 5--32, 2007. MineFleet is probably the first commercially successful widely adopted distributed data mining system for a new vertical where data mining systems were not used before. The development of MineFleet and its adoption in the mobile resource management and fleet management industry came through long-term interactions with the leading companies in that vertical. It required adopting a different architecture for the data mining system. Unlike the traditional centralized data mining system commonly used in the most applications today, MineFleet adopted the distributed data mining technology where data must be analyzed in a distributed manner and then aggregated at the server for comparative analysis. [6] B. Park and H. Kargupta (2002). Distributed Data Mining: Algorithms, Systems, and Applications. Data Mining Handbook. Editor: Nong Ye. [7] S. Krishnaswamy, S. Loke, A. Rakotonirainy, O. Horovitz, and M. Gaber. (2005) Towards Situation-awareness and Ubiquitous Data Mining for Road Safety: Rationale and Architecture for a Compelling Application, Proceedings of Conference on Intelligent Vehicles and Road Infrastructure (IVRI’05), held at the University of Melbourne, pp. 16-17 February 2005. 11. ACKNOWLEDGMENTS [8] S. Pittie, H. Kargupta, and B. Park. (2003). Dependency Detection in MobiMine: A Systems Perspective. Information Sciences Journal. Volume 155, Issues 3-4, pp. 227-243, Elsevier. We thank Agnik for supporting the work and this publication. We would also like to thank the large number of developers involved with this project at Agnik. We particularly thank the following individuals for their contributions to the development of the MineFleet system: Nick Lenzi, Derek Johnson, Subhash Paruchuru, Robert Gilligan, Barnali Sinha, Parag Namjoshi, Thiraphat Pongsudhiraks, Jacob Graham, Kamalika Das, Michael Beck, Padma Sethu, Brian Bende, Martin D. Klein, James Dull and Patrick T. Joyce. We would also like to thank all our channel partners for marketing the MineFleet product. [9] A. N. Srivastava, W. Buntine. (1995). Predicting Engine Parameters using the Optical Spectrum. Proceedings of the AIAA Electrochemical Conference. [10] A. N. Srivastava, J. Stroeve. (2003). Onboard Detection of Snow, Ice, Clouds, and Other Processes. Proceedings of the ICML 2003 Workshop on Machine Learning Technologies for Autonomous Space Sciences. International Conference on Machine Learning. 12. REFERENCES [11] H. Dutta, H. Kargupta, and A. Joshi. (2005). Orthogonal Decision Trees for Resource-Constrained Physiological Data Stream Monitoring using Mobile Devices. Proceedings of the High Performance Computing Conference. [1] S. Datta, K. Bhaduri, C. Giannella, R. Wolff, H. Kargupta. (2006). Distributed Data Mining in Peer-to-Peer Networks. (Invited submission to the IEEE Internet Computing special issue on Distributed Data Mining), Volume 10, Number 4, pp. 18--26. [12] S. Pirttikangas, J. Riekki, J. Kaartinen, J. Miettinen, S. Nissila, and J. Roning. (2001). Genie of the Net: A New Approach for a Context-Aware Health Club. Workshop Title: Ubiquitous Data Mining for Mobile and Distributed Environments. Joint 12th European Conference on Machine Learning (ECML'01) and 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'01). September 3-7, 2001, Freiburg, Germany. [2] H. Kargupta and K. Sivakumar, (2004) Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions. Editors: H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha. AAAI/MIT Press. [3] H. Kargupta, B. Park, D. Hershberger, and E. Johnson (1999). Collective Data Mining: A New Perspective Toward 46