www.globalsoftsolutions.in gsstrichy@gmail.com EFFICIENT FILTERING ALGORITHMS FOR LOCATIONAWARE PUBLISH/SUBSCRIBE ABSTRACT Location-based services have been widely adopted in many systems. Existing works employ a pull model or user-initiated model, where a user issues a query to a server which replies with location-aware answers. To provide users with instant replies, a push model or server-initiated model is becoming an inevitable computing model in the next-generation location-based services. In the push model, subscribers register spatio-textual subscriptions to capture their interests, and publishers post spatio-textual messages. These calls for a high-performance location-aware publish/subscribe system to deliver publishers’ messages to relevant subscribers. The research challenges has been studied that arise in designing a location-aware publish/subscribe system. An R-tree based index is proposed by integrating textual descriptions into R-tree nodes. An efficient filtering algorithm is proposed and effective pruning techniques to achieve high performance. This method can support both conjunctive queries and ranking queries. www.globalsoftsolutions.in gsstrichy@gmail.com INTRODUCTION Location based services (LBS) have attracted significant attention from both industrial and academic communities. Many LBS services such as Foursquare and Google Maps have been widely accepted because they can provide users with location-aware experiences. Existing LBS systems employ a pull model or userinitiated model, where a user issues a query to a server which responds with location aware answers. For example, if a mobile user wants to find seafood restaurants nearby, she issues a query “seafood restaurant” to an LBS system, which returns answers based on user’s location and keywords. PROBLEM DEFINTION To provide users with instant replies, a push model or server-initiated model is becoming an inevitable computing model in next-generation location-based services. In the push model, subscribers register spatio-textual subscriptions to capture their interests, and publishers post spatio textual messages. This calls for a high-performance location- aware publish/subscribe system to deliver messages to relevant subscribers. This computing model brings new user experiences to mobile users, and can help users retrieve information without explicitly issuing a query. One big challenge in a publish/subscribe system is to achieve high performance. A publish/subscribe system should support tens of millions of subscribers and deliver messages to relevant subscribers in milliseconds. Since messages and subscriptions contain both location information and textual description, it is rather costly to deliver messages to relevant subscribers. These calls for an efficient filtering technique to support location-aware publish/subscribe services. EXISTING SYSTEM www.globalsoftsolutions.in There are many gsstrichy@gmail.com real-world applications using location aware publish/subscribe services. The first one is Groupon, in which customers register their interests with locations and keywords. For each Groupon message, the system provider sends the message to the customers who may be potentially interested in the message by evaluating the spatial proximity and textual relevancy between subscriptions and the message. The second one is location-aware AdSense, which extends traditional AdSense to support location-aware services. The advertisers register their location-based advertisements in the system. The system pushes relevant advertisements to mobile users based on their locations and contents they are browsing. The third one is tweet delivery. To receive feedback of their products in a specific area from Twitter, market analysts register their interests. For each tweet, the system pushes the tweet to relevant analysts whose spatio-textual subscriptions match the tweet. www.globalsoftsolutions.in gsstrichy@gmail.com PROPOSED SYSTEM To address the challenge, a token-based Rtree index structure is proposed by integrating each Rtree node with a set of tokens selected from subscriptions. Using the Rt-tree, a filter-and-verification framework is developed to efficiently deliver a message. To reduce the number of tokens associated with Rt-tree nodes, select some high-quality representative tokens from subscriptions and associate them with Rt-tree nodes. Advantages This technique not only reduces index sizes but also improves the performance. Method can support both conjunctive queries and ranking queries. Efficient filtering algorithms is used Effective pruning techniques to improve the performance. Support dynamic updates efficiently. www.globalsoftsolutions.in gsstrichy@gmail.com HARDWARE REQUIREMENTS Processor : Any Processor above 500 MHz. Ram : 128Mb. Hard Disk : 10 Gb. Compact Disk : 650 Mb. Input device : Standard Keyboard and Mouse. Output device : VGA and High Resolution Monitor. SOFTWARE SPECIFICATION Operating System : Windows Family. Techniques : JDK 1.5 or higher Database : MySQL 5.0 PROBABILISTIC RANGE QUERY OVER UNCERTAIN MOVING OBJECTS IN CONSTRAINED TWO-DIMENSIONAL SPACE ABSTRACT Probabilistic range query (PRQ) over uncertain moving objects has attracted much attentions in recent years. Most of existing works focus on the PRQ for objects moving freely in two-dimensional (2D) space. In contrast, this work studies the www.globalsoftsolutions.in gsstrichy@gmail.com PRQ over objects moving in a constrained 2D space where objects are forbidden to be located in some specific areas and called it the constrained space probabilistic range query (CSPRQ). Its unique properties are analyzed and show that to process the CSPRQ using a straightforward solution is infeasible. The key idea of solution is to use a strategy called pre-approximation that can reduce the initial problem to a highly simplified version, implying that it makes the rest of steps easy to tackle. In particular, this strategy itself is pretty simple and easy to implement. Furthermore, motivated by the cost analysis, optimizations are mainly based on two insights: (i) the number of effective subdivisions is no more than 1; and (ii) an entity with the larger span is more likely to subdivide a single region. It is demonstrated the effectiveness and efficiency of proposed approaches through extensive experiments under various experimental settings, and highlight an extra finding the pre-computation based method suffers a non-trivial preprocessing time, which offers an important indication sign for the future research. www.globalsoftsolutions.in gsstrichy@gmail.com INTRODUCTION Range query for moving objects has been the subject of much attention as it can find applications in various domains such as the digital battlefield, mobile workforce management, and transportation industry. It is usual that for a moving object o, only the discrete location information is stored on the database server, due to various reasons such as the limited battery power of mobile devices and the limited network bandwidth. The recorded location of o can be obtained by accessing the database, the whereabouts of its current location is usually uncertain. For example, a common location update policy called dead reckoning is to update the recorded location lr when the deviation between lr and the actual location of o is larger than a given distance threshold t. Before the next update, the specific location of o is uncertain, except knowing that it lies in a circle with the center lr and radius t. To capture the location uncertainty, the idea of incorporating uncertainty into moving objects data has been proposed. From then on, the probabilistic range query (PRQ) as a variant of the traditional range query has attracted much attentions in the data management community. www.globalsoftsolutions.in gsstrichy@gmail.com EXISTING SYSTEM Little efforts are made for the PRQ over objects moving in a constrained 2D space where objects are forbidden to be located in some specific areas. PROPOSED SYSTEM Objects moving in a constrained 2D space where objects are forbidden to be located in some specific areas such specific areas as restricted areas, and dub the query above the Constrained Space Probabilistic Range Query (CSPRQ). The CSPRQ can also find many applications as objects moving in a constrained 2D space are common in the real world. For example, the tanks in the digital battlefield usually cannot run in lakes, forests and the like, the areas occupied by those obstacles can be naturally regarded as restricted areas. The key idea of proposed solution is to use a strategy called preapproximation that can reduce the initial problem to a highly simplified version, implying that it makes the rest of steps easy to tackle. The optimizations are mainly based on two insights: (i) the number of effective subdivisions is no more than 1, we utilize this insight to improve the power pruning restricted areas; and (ii) an entity with the larger span is more likely to subdivide a single region, this insight motivates us to sort the entities to be processed according to their spans. www.globalsoftsolutions.in gsstrichy@gmail.com In addition to the main insights above, we also realize two other facts and utilize them. Specifically, two mechanisms are developed: postpone processing and lazy update. Advantages This strategy itself is pretty simple and easy to implement. To operate different entities in a unified and efficient manner, a label based data structure is developed. Ascribing the pre-approximation and label based data structure, it is pretty simple to compute the appearance probability. To improve the I/O efficiency, a twin index is naturally adopted. www.globalsoftsolutions.in gsstrichy@gmail.com HARDWARE REQUIREMENTS Processor : Any Processor above 500 MHz. Ram : 128Mb. Hard Disk : 10 Gb. Compact Disk : 650 Mb. Input device : Standard Keyboard and Mouse. Output device : VGA and High Resolution Monitor. SOFTWARE SPECIFICATION Operating System : Windows Family. Techniques : JDK 1.5 or higher Database : MySQL 5.0 CROWDOP: QUERY OPTIMIZATION FOR DECLARATIVE CROWDSOURCING SYSTEMS ABSTRACT Query optimization problem is studied in declarative crowdsourcing systems. Declarative crowdsourcing is designed to hide the complexities and relieve the user the burden of dealing with the crowd. The user is only required to submit an SQL-like query and the system takes the responsibility of compiling the query, generating the execution plan and evaluating in the crowdsourcing marketplace. A given query can have many alternative execution plans and the www.globalsoftsolutions.in gsstrichy@gmail.com difference in crowdsourcing cost between the best and the worst plans may be several orders of magnitude. Therefore, as in relational database systems, query optimization is important to crowdsourcing systems that provide declarative query interfaces. CROWDOP, a cost-based query optimization approach is proposed for declarative crowdsourcing systems. CROWDOP considers both cost and latency in the query optimization objectives and generates query plans that provide a good balance between the cost and latency. An efficient algorithm in the CROWDOP for optimizing three types of queries: selection queries, join queries and complex selection-join queries. www.globalsoftsolutions.in gsstrichy@gmail.com INTRODUCTION Crowdsourcing has attracted growing interest in recent years as an effective tool for harnessing human intelligence to solve problems that computers cannot perform well, such as document translation, handwriting recognition, audio transcription and photo tagging. Various solutions have been proposed for performing common database operations over crowdsourced data, such as selection (filtering), join, sort/rank and count. Recent crowdsourcing systems, such as CrowdDB, Qurk and Deco, provide an SQL-like query language as a declarative interface to the crowd. An SQL like declarative interface is designed to encapsulate the complexities of dealing with the crowd and provide the crowdsourcing system an interface that is familiar to most database users. Consequently, for a given query, a declarative system must first compile the query, generate the execution plan, post the human intelligence tasks (HITs) to the crowd according to the plan, collect the answers, handle errors and resolve the inconsistencies in the results. www.globalsoftsolutions.in gsstrichy@gmail.com PROBLEM DEFINITION While declarative querying improves the usability of the system, it requires the system to have the capability to optimize and provide a “near optimal” query execution plan for each query. Since a declarative crowdsourcing query can be evaluated in many ways, the choice of execution plan has a significant impact on overall performance, which includes the number of questions being asked, the types/difficulties of the questions and the monetary cost incurred. PROBLEM SOLUTION It is therefore important to design an efficient crowdsourcing query optimizer that is able to consider all potentially good query plans and select the “best” plan based on a cost model and optimization objectives. To address this challenge, a novel optimization approach CROWDOP is proposed to finding the most efficient query plan for answering a query. www.globalsoftsolutions.in gsstrichy@gmail.com EXISTING SYSTEM While declarative querying improves the usability of the system, it requires the system to have the capability to optimize and provide a “near optimal” query execution plan for each query. Disadvantages Since a declarative crowdsourcing query can be evaluated in many ways, the choice of execution plan has a significant impact on overall performance, which includes the number of questions being asked, the types/difficulties of the questions and the monetary cost incurred. www.globalsoftsolutions.in gsstrichy@gmail.com PROPOSED SYSTEM Query optimization objectives to minimize the latency under user-defined cost budget. To develop efficient algorithms for optimizing selection, join and complex queries. Like in traditional databases, optimization mechanisms in crowdsourcing systems can be broadly classified into rule-based and cost-based. A rulebased optimizer simply applies a set of rules instead of estimating the cost to determine the best query plan. CROWDOP considers three commonly used operators in crowdsouring systems: FILL solicits the crowd to fill in missing values in databases; SELECT asks the crowd to filter items satisfying certain constraints; and JOIN leverages the crowd to match items according to some criteria. Advantages Cost-based query optimization Considers cost-latency tradeoffs and supports multiple crowdsourcing operators www.globalsoftsolutions.in gsstrichy@gmail.com HARDWARE REQUIREMENTS Processor Ram Hard Disk : Any Processor above 500 MHz. : 128Mb. : 10 Gb. Compact Disk : 650 Mb. Input device : Standard Keyboard and Mouse. Output device : VGA and High Resolution Monitor. SOFTWARE SPECIFICATION Operating System : Windows Family. Programming Language : JDK 1.5 or higher Database : MySQL 5.0 DISEASE INFERENCE FROM HEALTH-RELATED QUESTIONS VIA SPARSE DEEP LEARNING ABSTRACT Automatic disease inference is of importance to bridge the gap between what online health seekers with unusual symptoms need and what busy human doctors with biased expertise can offer. However, accurately and efficiently inferring diseases is non-trivial, especially for community-based health services due to the www.globalsoftsolutions.in gsstrichy@gmail.com vocabulary gap, incomplete information, correlated medical concepts, and limited high quality training samples. A user study is reported on the information needs of health seekers in terms of questions and then select those that ask for possible diseases of their manifested symptoms for further analytic. Next propose a novel deep learning scheme to infer the possible diseases given the questions of health seekers. The proposed scheme is comprised of two key components. The first globally mines the discriminant medical signatures from raw features. The second deems the raw features and their signatures as input nodes in one layer and hidden nodes in the subsequent layer, respectively. Meanwhile, it learns the inter-relations between these two layers via pre-training with pseudo-labeled data. Following that, the hidden nodes serve as raw features for the more abstract signature mining. With incremental and alternative repeating of these two components, this scheme builds a sparsely connected deep architecture with three hidden layers. Overall, it well fits specific tasks with fine-tuning. AIM To build a disease inference scheme that is able to automatically infer the possible diseases of the given questions in community-based health services. INTRODUCTION The greying of society, escalating costs of healthcare and burgeoning computer technologies are together driving more consumers to spend longer time online to explore health information. One survey shows that 59 percent of U.S. adults have explored the internet as a diagnostic tool in 2012. Another survey reports that the average U.S. consumer spends close to 52 hours annually online to find wellness knowledge, while only visits the doctors three times per year in 2013. These www.globalsoftsolutions.in gsstrichy@gmail.com findings have heightened the importance of online health resources as springboards to facilitate patient-doctor communication. The current prevailing online health resources can be roughly categorized into two categories. One is the reputable portals run by official sectors, renowned organizations, or other professional health providers. They are disseminating up-todate health information by releasing the most accurate, well-structured, and formally presented health knowledge on various topics. www.globalsoftsolutions.in gsstrichy@gmail.com SCOPE OF WORK It is highly desirable to develop automatic and comprehensive wellness systems that can instantly answer all-round questions of health seekers and alleviate the doctors’ workload. The biggest stumbling block of automatic health system is disease inference. According to user study, health seekers frequently ask for: (1) supplemental cues of their diagnosed diseases; (2) preventive information of their concerned diseases; and (3) possible diseases of their manifested signals. CHALLENGES It is expensive to construct the ground truth for various diseases. These factors limit the disease inference performance that can be obtained by general shallow learning methods. Disease inference is a reasoning consequence based on the given question, this task is nontrivial due to following reasons. First, vocabulary gap between diverse health seekers makes the data more inconsistent, as compared to other formats of health data. For example, “shortness of breath” and “breathless” were used by different health seekers to refer to the same semantic “dyspnea”. Second, health seekers describe their problems in short questions, containing 14:5 terms per question on average. The incompleteness hinders the effective similarity estimation based on shared contexts. Third, medical attributes such as age, gender and symptoms, are highly correlated and do not unusually appear as compact patterns to signal the health problems. For example, “tight chest”, “wheezing”, and “dyspnea” frequently co-occur to hint of “asthma”. www.globalsoftsolutions.in gsstrichy@gmail.com EXISTING SYSTEM Little research has been dedicated to disease inference in the communitybased health services. Information extraction from medical text is the basis for other higher-order analytics, such as representation, classification, and clustering. SVM is used to recognize the medication related entities in hospital discharge summaries, and classified these atomic elements into pre-defined categories, such as treatments and conditions. Machine learning techniques used to assist health professionals in the diagnosis of diseases. Disadvantages These approaches are not applicable to online health data. From the perspective of data property, they have different data structure, quality and number of training samples. From the point of techniques, most of the previous efforts are unable to take advantages of other data types beyond the targeted ones, and hence are not scalable or generalizable. PROPOSED SYSTEM To build a disease inference scheme that is able to automatically infer the possible diseases of the given questions in community-based health services. Analyze and categorize the information needs of health seekers. www.globalsoftsolutions.in gsstrichy@gmail.com As a byproduct, differentiate questions of this kind that require disease inference from other kinds. It is worth emphasizing that large-scale data often leads to explosion of feature space in the lights of n-gram representations, especially for the community generated inconsistent data. Distinguished from the conventional sporadic efforts that generally focus on only a single or a few diseases based on the hospital generated records with structured fields, proposed scheme benefits from the volume of unstructured community generated data and it is capable of handling various kinds of diseases effectively. It investigates and categorizes the information needs of health seekers in the community-based health services and mines the signatures of their generated data. It proposes a sparsely connected deep learning scheme to infer various kinds of diseases. This scheme is pre-trained with pseudo-labeled data and further strengthened by fine-tuning with online doctor labeled data. This scheme builds a novel deep learning model, comprising two components. The first globally mines the latent medical signatures. The raw features and signatures respectively serve as input nodes in one layer and hidden nodes in the subsequent layer. Advantages www.globalsoftsolutions.in gsstrichy@gmail.com Different from conventional deep learning algorithms, the number of hidden nodes in each layer of proposed model is automatically determined and the connections between two adjacent layers are sparse, which make it faster. This model is generalizable and scalable. Fine-tuning with a small set of labeled disease samples fits proposed model to specific disease inference. www.globalsoftsolutions.in gsstrichy@gmail.com HARDWARE REQUIREMENTS Processor : Any Processor above 500 MHz. Ram : 128Mb. Hard Disk : 10 Gb. Compact Disk : 650 Mb. Input device : Standard Keyboard and Mouse. Output device : VGA and High Resolution Monitor. SOFTWARE SPECIFICATION Operating System : Windows Family. Pages developed using : Java Swing Techniques Database : JDK 1.5 or higher : MySQL 5.0 REAL-TIME CITY-SCALE TAXI RIDESHARING ABSTRACT A taxi-sharing system is proposed and developed that accepts taxi passengers’ realtime ride requests and schedules proper taxis to pick up them via ridesharing, www.globalsoftsolutions.in gsstrichy@gmail.com subject to time, capacity, and monetary constraints. The monetary constraints provide incentives for both passengers and taxi drivers: passengers will not pay more compared with no ridesharing and get compensated if their travel time is lengthened due to ridesharing; taxi drivers will make money for all the detour distance due to ridesharing. While such a system is of significant social and environmental benefit, e.g., saving energy consumption and satisfying people’s commute, real-time taxi-sharing has not been well studied yet. To this end, a mobile-cloud architecture based taxi-sharing system is devised. Taxi riders and taxi drivers use the taxi-sharing service provided by the system via a smart phone App. The Cloud first finds candidate taxis quickly for a taxi ride request using a taxi searching algorithm supported by a spatio-temporal index. A scheduling process is then performed in the cloud to select a taxi that satisfies the request with minimum increase in travel distance. Proposed system demonstrated its efficiency, effectiveness and scalability. For example, when the ratio of the number of ride requests to the number of taxis is 6, proposed system serves three times as many taxi riders as that when no ridesharing is performed while saving 11 percent in total travel distance and 7 percent taxi fare per rider. www.globalsoftsolutions.in gsstrichy@gmail.com INTRODUCTION Taxi is an important transportation mode between public and private transportations, delivering millions of passengers to different locations in urban areas. However, taxi demands are usually much higher than the number of taxis in peak hours of major cities, resulting in that many people spend a long time on roadsides before getting a taxi. Increasing the number of taxis seems an obvious solution. But it brings some negative effects, e.g., causing additional traffic on the road surface and more energy consumption, and decreasing taxi driver’s income (considering that demands of taxis would be lower than number of taxis during offpeak hours). www.globalsoftsolutions.in gsstrichy@gmail.com PROPOSED SYSTEM To propose a taxi-sharing system that accepts taxi passengers’ real-time ride requests sent and schedules proper taxis to pick up them via taxi-sharing with time, capacity, and monetary constraints. Taxi drivers independently determine when to join and leave the service. Passengers submit real-time ride requests. Each ride request consists of the origin and destination of the trip, time windows constraining when the passengers want to be picked up and dropped off. On receiving a new request, the Cloud will first search for the taxi which minimizes the travel distance increased for the ride request and satisfies both the new request and the trips of existing passengers who are already assigned to the taxi, subject to time, capacity, and monetary constraints. Then the existing passengers assigned to the taxi will be inquired by the cloud whether they agree to pick up the new passenger given the possible decrease in fare and increase in travel time. Only with a unanimous agreement, the updated schedules will be then given to the corresponding taxi drivers and passengers. www.globalsoftsolutions.in gsstrichy@gmail.com Advantages This system saves energy consumption and eases traffic congestion while enhancing the capacity of commuting by taxis. It reduces the taxi fare of taxi riders and increases the profit of taxi drivers. Real-time taxi-sharing has not been well explored, though ridesharing based on private cars, often known as carpooling or recurring ridesharing, was studied for years to deal with people’s routine commutes, e.g., from home to work. Proposed ridesharing model considers more practical constraints which include time windows, capacity, and monetary constraints for taxi trips. Efficient searching and scheduling algorithms that are capable of allocating the “right” taxi among tens of thousands of taxis for a query in milliseconds. The cloud integrates multiple important components including taxi indexing, searching, and scheduling. Specifically, propose a spatio-temporal indexing structure, a taxi searching algorithm, and a scheduling algorithm. Supported by the index, the two algorithms quickly serve a large number of real-time ride requests while reducing the travel distance of taxis compared with the case without taxi-sharing. Provide incentives not only for passengers but also for taxi drivers: passengers will not pay more compared with no ridesharing and get compensated if their travel time is lengthened due to ridesharing; taxi drivers will make money for all the reroute distance due to ridesharing. The monetary constraints makes modeling of the taxi ridesharing problem more realistic. www.globalsoftsolutions.in gsstrichy@gmail.com HARDWARE REQUIREMENTS Processor : Any Processor above 500 MHz. Ram : 128Mb. Hard Disk : 10 Gb. Compact Disk : 650 Mb. Input device : Standard Keyboard and Mouse. Output device : VGA and High Resolution Monitor. SOFTWARE SPECIFICATION Operating System : Windows Family. Techniques : JDK 1.5 or higher, Android APK Tools : Eclipse