Synopsis Detailsl for Titles 1- 5

advertisement
www.globalsoftsolutions.in
gsstrichy@gmail.com
EFFICIENT FILTERING ALGORITHMS FOR LOCATIONAWARE PUBLISH/SUBSCRIBE
ABSTRACT
Location-based services have been widely adopted in many systems. Existing
works employ a pull model or user-initiated model, where a user issues a query to
a server which replies with location-aware answers. To provide users with instant
replies, a push model or server-initiated model is becoming an inevitable
computing model in the next-generation location-based services. In the push
model, subscribers register spatio-textual subscriptions to capture their interests,
and publishers post spatio-textual messages. These calls for a high-performance
location-aware publish/subscribe system to deliver publishers’ messages to
relevant subscribers. The research challenges has been studied that arise in
designing a location-aware publish/subscribe system. An R-tree based index is
proposed by integrating textual descriptions into R-tree nodes. An efficient
filtering algorithm is proposed and effective pruning techniques to achieve high
performance. This method can support both conjunctive queries and ranking
queries.
www.globalsoftsolutions.in
gsstrichy@gmail.com
INTRODUCTION
Location based services (LBS) have attracted significant attention from both
industrial and academic communities. Many LBS services such as Foursquare and
Google Maps have been widely accepted because they can provide users with
location-aware experiences. Existing LBS systems employ a pull model or userinitiated model, where a user issues a query to a server which responds with
location aware answers. For example, if a mobile user wants to find seafood
restaurants nearby, she issues a query “seafood restaurant” to an LBS system,
which returns answers based on user’s location and keywords.
PROBLEM DEFINTION
To provide users with instant replies, a push model or server-initiated model is
becoming an inevitable computing model in next-generation location-based
services. In the push model, subscribers register spatio-textual subscriptions to
capture their interests, and publishers post spatio textual messages. This calls for a
high-performance location- aware publish/subscribe system to deliver messages to
relevant subscribers. This computing model brings new user experiences to mobile
users, and can help users retrieve information without explicitly issuing a query.
One big challenge in a publish/subscribe system is to achieve high performance. A
publish/subscribe system should support tens of millions of subscribers and deliver
messages to relevant subscribers in milliseconds. Since messages and subscriptions
contain both location information and textual description, it is rather costly to
deliver messages to relevant subscribers. These calls for an efficient filtering
technique to support location-aware publish/subscribe services.
EXISTING SYSTEM
www.globalsoftsolutions.in
 There
are
many
gsstrichy@gmail.com
real-world
applications
using
location
aware
publish/subscribe services.
 The first one is Groupon, in which customers register their interests with
locations and keywords. For each Groupon message, the system provider
sends the message to the customers who may be potentially interested in the
message by evaluating the spatial proximity and textual relevancy between
subscriptions and the message.
 The second one is location-aware AdSense, which extends traditional
AdSense to support location-aware services. The advertisers register their
location-based advertisements in the system. The system pushes relevant
advertisements to mobile users based on their locations and contents they are
browsing.
 The third one is tweet delivery. To receive feedback of their products in a
specific area from Twitter, market analysts register their interests. For each
tweet, the system pushes the tweet to relevant analysts whose spatio-textual
subscriptions match the tweet.
www.globalsoftsolutions.in
gsstrichy@gmail.com
PROPOSED SYSTEM
 To address the challenge, a token-based Rtree index structure is proposed by
integrating each Rtree node with a set of tokens selected from subscriptions.
 Using the Rt-tree, a filter-and-verification framework is developed to
efficiently deliver a message.
 To reduce the number of tokens associated with Rt-tree nodes, select some
high-quality representative tokens from subscriptions and associate them
with Rt-tree nodes.
Advantages
 This technique not only reduces index sizes but also improves the
performance.
 Method can support both conjunctive queries and ranking queries.
 Efficient filtering algorithms is used
 Effective pruning techniques to improve the performance.
 Support dynamic updates efficiently.
www.globalsoftsolutions.in
gsstrichy@gmail.com
HARDWARE REQUIREMENTS
Processor
: Any Processor above 500 MHz.
Ram
: 128Mb.
Hard Disk
: 10 Gb.
Compact Disk
: 650 Mb.
Input device
: Standard Keyboard and Mouse.
Output device
: VGA and High Resolution Monitor.
SOFTWARE SPECIFICATION
Operating System
: Windows Family.
Techniques
: JDK 1.5 or higher
Database
: MySQL 5.0
PROBABILISTIC RANGE QUERY OVER UNCERTAIN MOVING
OBJECTS IN CONSTRAINED TWO-DIMENSIONAL SPACE
ABSTRACT
Probabilistic range query (PRQ) over uncertain moving objects has attracted much
attentions in recent years. Most of existing works focus on the PRQ for objects
moving freely in two-dimensional (2D) space. In contrast, this work studies the
www.globalsoftsolutions.in
gsstrichy@gmail.com
PRQ over objects moving in a constrained 2D space where objects are forbidden to
be located in some specific areas and called it the constrained space probabilistic
range query (CSPRQ). Its unique properties are analyzed and show that to process
the CSPRQ using a straightforward solution is infeasible. The key idea of solution
is to use a strategy called pre-approximation that can reduce the initial problem to a
highly simplified version, implying that it makes the rest of steps easy to tackle. In
particular, this strategy itself is pretty simple and easy to implement. Furthermore,
motivated by the cost analysis, optimizations are mainly based on two insights: (i)
the number of effective subdivisions is no more than 1; and (ii) an entity with the
larger span is more likely to subdivide a single region. It is demonstrated the
effectiveness and efficiency of proposed approaches through
extensive
experiments under various experimental settings, and highlight an extra finding the
pre-computation based method suffers a non-trivial preprocessing time, which
offers an important indication sign for the future research.
www.globalsoftsolutions.in
gsstrichy@gmail.com
INTRODUCTION
Range query for moving objects has been the subject of much attention as it can
find applications in various domains such as the digital battlefield, mobile
workforce management, and transportation industry. It is usual that for a moving
object o, only the discrete location information is stored on the database server, due
to various reasons such as the limited battery power of mobile devices and the
limited network bandwidth. The recorded location of o can be obtained by
accessing the database, the whereabouts of its current location is usually uncertain.
For example, a common location update policy called dead reckoning is to update
the recorded location lr when the deviation between lr and the actual location of o
is larger than a given distance threshold t. Before the next update, the specific
location of o is uncertain, except knowing that it lies in a circle with the center lr
and radius t. To capture the location uncertainty, the idea of incorporating
uncertainty into moving objects data has been proposed. From then on, the
probabilistic range query (PRQ) as a variant of the traditional range query has
attracted much attentions in the data management community.
www.globalsoftsolutions.in
gsstrichy@gmail.com
EXISTING SYSTEM
 Little efforts are made for the PRQ over objects moving in a constrained 2D
space where objects are forbidden to be located in some specific areas.
PROPOSED SYSTEM
 Objects moving in a constrained 2D space where objects are forbidden to be
located in some specific areas such specific areas as restricted areas, and dub
the query above the Constrained Space Probabilistic Range Query (CSPRQ).
 The CSPRQ can also find many applications as objects moving in a
constrained 2D space are common in the real world. For example, the tanks
in the digital battlefield usually cannot run in lakes, forests and the like, the
areas occupied by those obstacles can be naturally regarded as restricted
areas.
 The key idea of proposed solution is to use a strategy called preapproximation that can reduce the initial problem to a highly simplified
version, implying that it makes the rest of steps easy to tackle.
 The optimizations are mainly based on two insights: (i) the number of
effective subdivisions is no more than 1, we utilize this insight to improve
the power pruning restricted areas; and
 (ii) an entity with the larger span is more likely to subdivide a single region,
this insight motivates us to sort the entities to be processed according to their
spans.
www.globalsoftsolutions.in
gsstrichy@gmail.com
 In addition to the main insights above, we also realize two other facts and
utilize them.
 Specifically, two mechanisms are developed: postpone processing and lazy
update.
Advantages
 This strategy itself is pretty simple and easy to implement.
 To operate different entities in a unified and efficient manner, a label based
data structure is developed.
 Ascribing the pre-approximation and label based data structure, it is pretty
simple to compute the appearance probability.
 To improve the I/O efficiency, a twin index is naturally adopted.
www.globalsoftsolutions.in
gsstrichy@gmail.com
HARDWARE REQUIREMENTS
Processor
: Any Processor above 500 MHz.
Ram
: 128Mb.
Hard Disk
: 10 Gb.
Compact Disk
: 650 Mb.
Input device
: Standard Keyboard and Mouse.
Output device
: VGA and High Resolution Monitor.
SOFTWARE SPECIFICATION
Operating System
: Windows Family.
Techniques
: JDK 1.5 or higher
Database
: MySQL 5.0
CROWDOP: QUERY OPTIMIZATION FOR DECLARATIVE CROWDSOURCING
SYSTEMS
ABSTRACT
Query optimization problem is studied in declarative crowdsourcing systems.
Declarative crowdsourcing is designed to hide the complexities and relieve the
user the burden of dealing with the crowd. The user is only required to submit
an SQL-like query and the system takes the responsibility of compiling the
query, generating the execution plan and evaluating in the crowdsourcing
marketplace. A given query can have many alternative execution plans and the
www.globalsoftsolutions.in
gsstrichy@gmail.com
difference in crowdsourcing cost between the best and the worst plans may be
several orders of magnitude. Therefore, as in relational database systems,
query optimization is important to crowdsourcing systems that provide
declarative query interfaces. CROWDOP, a cost-based query optimization
approach is proposed for declarative crowdsourcing systems. CROWDOP
considers both cost and latency in the query optimization objectives and
generates query plans that provide a good balance between the cost and
latency. An efficient algorithm in the CROWDOP for optimizing three types of
queries: selection queries, join queries and complex selection-join queries.
www.globalsoftsolutions.in
gsstrichy@gmail.com
INTRODUCTION
Crowdsourcing has attracted growing interest in recent years as an effective
tool for harnessing human intelligence to solve problems that computers
cannot perform well, such as document translation, handwriting recognition,
audio transcription and photo tagging. Various solutions have been proposed
for performing common database operations over crowdsourced data, such as
selection (filtering), join, sort/rank and count.
Recent crowdsourcing systems, such as CrowdDB, Qurk and Deco, provide an
SQL-like query language as a declarative interface to the crowd. An SQL like
declarative interface is designed to encapsulate the complexities of dealing with
the crowd and provide the crowdsourcing system an interface that is familiar to
most database users. Consequently, for a given query, a declarative system
must first compile the query, generate the execution plan, post the human
intelligence tasks (HITs) to the crowd according to the plan, collect the answers,
handle errors and resolve the inconsistencies in the results.
www.globalsoftsolutions.in
gsstrichy@gmail.com
PROBLEM DEFINITION
While declarative querying improves the usability of the system, it requires the
system to have the capability to optimize and provide a “near optimal” query
execution plan for each query. Since a declarative crowdsourcing query can be
evaluated in many ways, the choice of execution plan has a significant impact
on overall performance, which includes the number of questions being asked,
the types/difficulties of the questions and the monetary cost incurred.
PROBLEM SOLUTION
It is therefore important to design an efficient crowdsourcing query optimizer
that is able to consider all potentially good query plans and select the “best”
plan based on a cost model and optimization objectives. To address this
challenge, a novel optimization approach CROWDOP is proposed to finding the
most efficient query plan for answering a query.
www.globalsoftsolutions.in
gsstrichy@gmail.com
EXISTING SYSTEM
 While declarative querying improves the usability of the system, it
requires the system to have the capability to optimize and provide a “near
optimal” query execution plan for each query.
Disadvantages
 Since a declarative crowdsourcing query can be evaluated in many ways,
the choice of execution plan has a significant impact on overall
performance, which includes the number of questions being asked, the
types/difficulties of the questions and the monetary cost incurred.
www.globalsoftsolutions.in
gsstrichy@gmail.com
PROPOSED SYSTEM
 Query optimization objectives to minimize the latency under user-defined
cost budget.
 To develop efficient algorithms for optimizing selection, join and complex
queries.
 Like in traditional databases, optimization mechanisms in crowdsourcing
systems can be broadly classified into rule-based and cost-based. A rulebased optimizer simply applies a set of rules instead of estimating the
cost to determine the best query plan.
 CROWDOP considers three commonly used operators in crowdsouring
systems: FILL solicits the crowd to fill in missing values in databases;
SELECT asks the crowd to filter items satisfying certain constraints; and
JOIN leverages the crowd to match items according to some criteria.
Advantages
 Cost-based query optimization
 Considers cost-latency tradeoffs and
 supports multiple crowdsourcing operators
www.globalsoftsolutions.in
gsstrichy@gmail.com
HARDWARE REQUIREMENTS
Processor
Ram
Hard Disk
: Any Processor above 500 MHz.
: 128Mb.
: 10 Gb.
Compact Disk
: 650 Mb.
Input device
: Standard Keyboard and Mouse.
Output device
: VGA and High Resolution Monitor.
SOFTWARE SPECIFICATION
Operating System
: Windows Family.
Programming Language : JDK 1.5 or higher
Database
: MySQL 5.0
DISEASE INFERENCE FROM HEALTH-RELATED QUESTIONS VIA
SPARSE DEEP LEARNING
ABSTRACT
Automatic disease inference is of importance to bridge the gap between what
online health seekers with unusual symptoms need and what busy human doctors
with biased expertise can offer. However, accurately and efficiently inferring
diseases is non-trivial, especially for community-based health services due to the
www.globalsoftsolutions.in
gsstrichy@gmail.com
vocabulary gap, incomplete information, correlated medical concepts, and limited
high quality training samples. A user study is reported on the information needs of
health seekers in terms of questions and then select those that ask for possible
diseases of their manifested symptoms for further analytic. Next propose a novel
deep learning scheme to infer the possible diseases given the questions of health
seekers. The proposed scheme is comprised of two key components. The first
globally mines the discriminant medical signatures from raw features. The second
deems the raw features and their signatures as input nodes in one layer and hidden
nodes in the subsequent layer, respectively. Meanwhile, it learns the inter-relations
between these two layers via pre-training with pseudo-labeled data. Following that,
the hidden nodes serve as raw features for the more abstract signature mining.
With incremental and alternative repeating of these two components, this scheme
builds a sparsely connected deep architecture with three hidden layers. Overall, it
well fits specific tasks with fine-tuning.
AIM
 To build a disease inference scheme that is able to automatically infer the
possible diseases of the given questions in community-based health services.
INTRODUCTION
The greying of society, escalating costs of healthcare and burgeoning computer
technologies are together driving more consumers to spend longer time online to
explore health information. One survey shows that 59 percent of U.S. adults have
explored the internet as a diagnostic tool in 2012. Another survey reports that the
average U.S. consumer spends close to 52 hours annually online to find wellness
knowledge, while only visits the doctors three times per year in 2013. These
www.globalsoftsolutions.in
gsstrichy@gmail.com
findings have heightened the importance of online health resources as springboards
to facilitate patient-doctor communication.
The current prevailing online health resources can be roughly categorized into two
categories. One is the reputable portals run by official sectors, renowned
organizations, or other professional health providers. They are disseminating up-todate health information by releasing the most accurate, well-structured, and
formally presented health knowledge on various topics.
www.globalsoftsolutions.in
gsstrichy@gmail.com
SCOPE OF WORK
It is highly desirable to develop automatic and comprehensive wellness systems
that can instantly answer all-round questions of health seekers and alleviate the
doctors’ workload. The biggest stumbling block of automatic health system is
disease inference. According to user study, health seekers frequently ask for: (1)
supplemental cues of their diagnosed diseases; (2) preventive information of their
concerned diseases; and (3) possible diseases of their manifested signals.
CHALLENGES
It is expensive to construct the ground truth for various diseases. These factors
limit the disease inference performance that can be obtained by general shallow
learning methods.
Disease inference is a reasoning consequence based on the given question, this task
is nontrivial due to following reasons.
First, vocabulary gap between diverse health seekers makes the data more
inconsistent, as compared to other formats of health data. For example, “shortness
of breath” and “breathless” were used by different health seekers to refer to the
same semantic “dyspnea”.
Second, health seekers describe their problems in short questions, containing 14:5
terms per question on average. The incompleteness hinders the effective similarity
estimation based on shared contexts.
Third, medical attributes such as age, gender and symptoms, are highly correlated
and do not unusually appear as compact patterns to signal the health problems. For
example, “tight chest”, “wheezing”, and “dyspnea” frequently co-occur to hint of
“asthma”.
www.globalsoftsolutions.in
gsstrichy@gmail.com
EXISTING SYSTEM
 Little research has been dedicated to disease inference in the communitybased health services.
 Information extraction from medical text is the basis for other higher-order
analytics, such as representation, classification, and clustering. SVM is used
to recognize the medication related entities in hospital discharge summaries,
and classified these atomic elements into pre-defined categories, such as
treatments and conditions.
 Machine learning techniques used to assist health professionals in the
diagnosis of diseases.
Disadvantages
 These approaches are not applicable to online health data.
 From the perspective of data property, they have different data structure,
quality and number of training samples.
 From the point of techniques, most of the previous efforts are unable to take
advantages of other data types beyond the targeted ones, and hence are not
scalable or generalizable.
PROPOSED SYSTEM
 To build a disease inference scheme that is able to automatically infer the
possible diseases of the given questions in community-based health services.
 Analyze and categorize the information needs of health seekers.
www.globalsoftsolutions.in
gsstrichy@gmail.com
 As a byproduct, differentiate questions of this kind that require disease
inference from other kinds.
 It is worth emphasizing that large-scale data often leads to explosion of
feature space in the lights of n-gram representations, especially for the
community generated inconsistent data.
 Distinguished from the conventional sporadic efforts that generally focus on
only a single or a few diseases based on the hospital generated records with
structured fields, proposed scheme benefits from the volume of unstructured
community generated data and it is capable of handling various kinds of
diseases effectively.
 It investigates and categorizes the information needs of health seekers in the
community-based health services and mines the signatures of their generated
data.
 It proposes a sparsely connected deep learning scheme to infer various kinds
of diseases. This scheme is pre-trained with pseudo-labeled data and further
strengthened by fine-tuning with online doctor labeled data.
 This scheme builds a novel deep learning model, comprising two
components.
 The first globally mines the latent medical signatures.
 The raw features and signatures respectively serve as input nodes in one
layer and hidden nodes in the subsequent layer.
Advantages
www.globalsoftsolutions.in
gsstrichy@gmail.com
 Different from conventional deep learning algorithms, the number of hidden
nodes in each layer of proposed model is automatically determined and the
connections between two adjacent layers are sparse, which make it faster.
 This model is generalizable and scalable.
 Fine-tuning with a small set of labeled disease samples fits proposed model
to specific disease inference.
www.globalsoftsolutions.in
gsstrichy@gmail.com
HARDWARE REQUIREMENTS
Processor
: Any Processor above 500 MHz.
Ram
: 128Mb.
Hard Disk
: 10 Gb.
Compact Disk
: 650 Mb.
Input device
: Standard Keyboard and Mouse.
Output device
: VGA and High Resolution Monitor.
SOFTWARE SPECIFICATION
Operating System
: Windows Family.
Pages developed using : Java Swing
Techniques
Database
: JDK 1.5 or higher
: MySQL 5.0
REAL-TIME CITY-SCALE TAXI RIDESHARING
ABSTRACT
A taxi-sharing system is proposed and developed that accepts taxi passengers’ realtime ride requests and schedules proper taxis to pick up them via ridesharing,
www.globalsoftsolutions.in
gsstrichy@gmail.com
subject to time, capacity, and monetary constraints. The monetary constraints
provide incentives for both passengers and taxi drivers: passengers will not pay
more compared with no ridesharing and get compensated if their travel time is
lengthened due to ridesharing; taxi drivers will make money for all the detour
distance due to ridesharing. While such a system is of significant social and
environmental benefit, e.g., saving energy consumption and satisfying people’s
commute, real-time taxi-sharing has not been well studied yet. To this end, a
mobile-cloud architecture based taxi-sharing system is devised. Taxi riders and taxi
drivers use the taxi-sharing service provided by the system via a smart phone App.
The Cloud first finds candidate taxis quickly for a taxi ride request using a taxi
searching algorithm supported by a spatio-temporal index. A scheduling process is
then performed in the cloud to select a taxi that satisfies the request with minimum
increase in travel distance. Proposed system demonstrated its efficiency,
effectiveness and scalability. For example, when the ratio of the number of ride
requests to the number of taxis is 6, proposed system serves three times as many
taxi riders as that when no ridesharing is performed while saving 11 percent in
total travel distance and 7 percent taxi fare per rider.
www.globalsoftsolutions.in
gsstrichy@gmail.com
INTRODUCTION
Taxi is an important transportation mode between public and private
transportations, delivering millions of passengers to different locations in urban
areas. However, taxi demands are usually much higher than the number of taxis in
peak hours of major cities, resulting in that many people spend a long time on
roadsides before getting a taxi. Increasing the number of taxis seems an obvious
solution. But it brings some negative effects, e.g., causing additional traffic on the
road surface and more energy consumption, and decreasing taxi driver’s income
(considering that demands of taxis would be lower than number of taxis during offpeak hours).
www.globalsoftsolutions.in
gsstrichy@gmail.com
PROPOSED SYSTEM
 To propose a taxi-sharing system that accepts taxi passengers’ real-time ride
requests sent and schedules proper taxis to pick up them via taxi-sharing
with time, capacity, and monetary constraints.
 Taxi drivers independently determine when to join and leave the service.
Passengers submit real-time ride requests. Each ride request consists of the
origin and destination of the trip, time windows constraining when the
passengers want to be picked up and dropped off.
 On receiving a new request, the Cloud will first search for the taxi which
minimizes the travel distance increased for the ride request and satisfies both
the new request and the trips of existing passengers who are already
assigned to the taxi, subject to time, capacity, and monetary constraints.
 Then the existing passengers assigned to the taxi will be inquired by the
cloud whether they agree to pick up the new passenger given the possible
decrease in fare and increase in travel time.
 Only with a unanimous agreement, the updated schedules will be then given
to the corresponding taxi drivers and passengers.
www.globalsoftsolutions.in
gsstrichy@gmail.com
Advantages
 This system saves energy consumption and eases traffic congestion while
enhancing the capacity of commuting by taxis.
 It reduces the taxi fare of taxi riders and increases the profit of taxi drivers.
 Real-time taxi-sharing has not been well explored, though ridesharing based
on private cars, often known as carpooling or recurring ridesharing, was
studied for years to deal with people’s routine commutes, e.g., from home to
work.
 Proposed ridesharing model considers more practical constraints which
include time windows, capacity, and monetary constraints for taxi trips.
 Efficient searching and scheduling algorithms that are capable of allocating
the “right” taxi among tens of thousands of taxis for a query in milliseconds.
 The cloud integrates multiple important components including taxi indexing,
searching, and scheduling. Specifically, propose a spatio-temporal indexing
structure, a taxi searching algorithm, and a scheduling algorithm. Supported
by the index, the two algorithms quickly serve a large number of real-time
ride requests while reducing the travel distance of taxis compared with the
case without taxi-sharing.
 Provide incentives not only for passengers but also for taxi drivers:
passengers will not pay more compared with no ridesharing and get
compensated if their travel time is lengthened due to ridesharing; taxi drivers
will make money for all the reroute distance due to ridesharing. The
monetary constraints makes modeling of the taxi ridesharing problem more
realistic.
www.globalsoftsolutions.in
gsstrichy@gmail.com
HARDWARE REQUIREMENTS
Processor
: Any Processor above 500 MHz.
Ram
: 128Mb.
Hard Disk
: 10 Gb.
Compact Disk
: 650 Mb.
Input device
: Standard Keyboard and Mouse.
Output device
: VGA and High Resolution Monitor.
SOFTWARE SPECIFICATION
Operating System
: Windows Family.
Techniques
: JDK 1.5 or higher, Android APK
Tools
: Eclipse
Download