Geographical Web Search Engines and Geographical Information Retrieval (GIR)

advertisement
Geographical Web Search Engines
and
Geographical Information Retrieval
(GIR)
Christopher Jones
Cardiff University
Edinburgh Euro GeoInf 2007
1
Where is Geo-information?
Personal knowledge (in our heads)
– of landscape, of where things, people and
services are located, where things happened…
Documents (various media)
– Lists of where facilities, resources, structures
are located
– Textual descriptions of geographic phenomena
– Images and videos of geographic space
Maps
Edinburgh Euro GeoInf 2007
2
GIS and the Web
A GIS typically :
World Wide Web is :
– Isolated
– Supports individual
organisation
– Accessed privately
– Small range of topics
– Structured data /
geo-coded locations
– Finds answers
– Complicated to use
– Global networked
– Supports everyone on
Internet
– Accessed publicly
– Vast range of topics
– Unstructured
free text / images
– Finds documents
– Easy to use
Edinburgh Euro GeoInf 2007
3
WWW as a source of geo-information
• Geographic context
embedded in natural
language descriptions
• Web queries depend on
exact match of text terms
• No intelligent
interpretation of spatial
relationships (“near”,
“west” etc)
• Place names ambiguous and
confused with names of
organisations, people,
buildings and streets
• No geo-relevance ranking
4
Current motivation of GIR :
Find geo-specific resources on the Web
(mostly documents and images)
find web resources about
Something related_to Somewhere
related_to = in, near, within Xkm, north_of ..etc.
• Resolve ambiguity of names (many places have same name)
• Interpret the query spatial relationships query footprint
near
north
• Find documents geographically associated with region of
query footprint
• Relevance rank geographically by place and subject
5
GIR, GIS and The Web
Geoknowledge
World
Knowledge
GIS
GIR
The Web
Edinburgh Euro GeoInf 2007
6
Geographical Search Engines
• Google etc have
“local” versions.
-Based on
business (yellow
pages) directories.
Edinburgh Euro GeoInf 2007
7
Geographical Search Engines
SPIRIT
research
prototype
general geo-web
search
Structured
user interface:
Dropdown menu
of spatial
relationships
Edinburgh Euro GeoInf 2007
8
Geographical search engines
SPIRIT
Results
listed as
URLs
Plus symbols
on map
Edinburgh Euro GeoInf 2007
9
User Interface screen shots from Ross Purves et al University of Zurich
Anatomy of a
Geographical Search Engine
Query disambiguation
User
Interface
Query footprint
Broker
Search Request
+ Query footprint
Ranked
Results
Web
Textual
Resources
Spatial
Text
Indexing
Geotagging
Place
Ontology
Document
Footprints
Search
Engine
Unranked
Results
Ranked
Results
Relevance
Ranking
Indexes
Spatial
Textual
Textual
Spatial
10
Geo-Tagging
= Geo-parsing + Geo-coding
Geo-parsing
Recognising genuine
geographic references
(place names,
addresses, post codes,
phone codes ) ignoring
non-geographic uses.
Geo-coding
– Attaching a unique
quantitative locations
(footprint) to
geographic references
11
Geo-Parsing : true & false references
Some types of false
geographic reference
• Personal names
Smedes York
• Business name
Dorchester Hotel,
York Properties..
• Street names
Oxford Street,
London Road…
• Common words that
are also places
urban, institute, land,
battle, derby, over,
well, ……
Edinburgh Euro GeoInf 2007
12
Geo-Parsing : distinguishing between
false and true geo-references
Look for patterns and context
Personal names (Jack London, Mr York):
<First_name> <Location>;
<Title> <Location>
Business names (Paris Hotel) :
<Business_type> <Location> (or vice versa)
Street names (Oxford Street) :
<Location> <Road_type>
Detect spatial propositions
in, near, south of, outside etc
“he lived in Over”
Genuine occurrencesEdinburgh
can be
used to train machine learning
Euro GeoInf 2007
13
Geo-coding (grounding) the
genuine geo-references
Many different places
with the same name
(referent ambiguity)
Newport, Cambridge,
Springfield………
Use context to decide
(references to parent
or nearby places )
Or – choose most
important one
(by population or place
Edinburgh Euro GeoInf 2007
type hierarchy)
14
Anatomy of a
Geographical Search Engine
Query disambiguation
User
Interface
Query footprint
Broker
Search Request
+ Query footprint
Ranked
Results
Web
Textual
Resources
Spatial
Place
Ontology
Text
Indexing
Search
Engine
Unranked
Results
Ranked
Results
Relevance
Ranking
Indexes
Spatial
Textual
Geo- Document
Textual
Footprints
tagging Edinburgh
Spatial
Euro GeoInf 2007
15
Indexing Web Resources
Standard text index is
inverted file
Query:
Restaurants in Cardiff
Find documents that
contain all terms
Text Term
List of resources
containing term
apple
Doc79, Doc89, Doc822….
Cardiff
Doc2, Doc19, Doc37, …
door
Doc16, Doc49, Doc112…..
hotel
Doc1, Doc2, Doc23, …
in
Doc4, Doc7, Doc19…
Works literally for “in”
London
Doc20, Doc35, Doc150…..
but won’t find
contained places.
pub
Doc9, Doc11, Doc100, …
Doesn’t work in general
restaurant
Doc19, Doc22, Doc37, ..
for “near”,
……………………. …………………………………………..
“Xkms from”,
“north_of” etc
Edinburgh Euro GeoInf 2007
16
Why Spatial Indexing?
Query “Hotels outside and within 30Kms of Glasgow”
Need to find documents referring to hotels that are in
places other than Glasgow
Query : “Castles in Wales”
Need to find documents that refer to names of places in
Wales (perhaps without mentioning “Wales”)
• In both cases to use conventional text indexing
requires a query to contain the names of all places
in Wales and all places outside Glasgow within 30km
Edinburgh Euro GeoInf 2007
17
Spatial indexing of resources
• Use dominant geographic references of
documents to create document footprints
(point, polygon, bounding rectangle..)
• Use footprints to index documents
• Convert query to a query footprint
• Match query footprint to doc. footprints
Spatial Query
Result
Edinburgh Euro GeoInf 2007
18
Anatomy of a
Geographical Search Engine
Query disambiguation
User
Interface
Query footprint
Broker
Search Request
+ Query footprint
Ranked
Results
Web
Textual
Resources
Spatial
Place
Ontology
Text
Indexing
Search
Engine
Unranked
Results
Ranked
Results
Relevance
Ranking
Indexes
Spatial
Textual
Geo- Document
Textual
Footprints
tagging Edinburgh
Spatial
Euro GeoInf 2007
19
Geographical Relevance Ranking
Example:
airports near Leicester
the further away, the lower
the spatial score
• Determine “distance” between query
footprint and document footprint
Q
D
• Depends on query spatial operator
(in, outside, X Kms from, north_of etc)
Euro GeoInf 2007
20
 Spatial score Edinburgh
Figure from Marc van Kreveld, University of Utrecht
Combining textual and spatial scores
• Textual scores: BM25
• Spatial scores: by spatial footprint
analysis
query / ideal footprint
1
normalized
BM25 score
footprints of
documents
0
spatial score
1
Figure from Marc van
Kreveld University
of
21
Utrecht
Anatomy of a
Geographical Search Engine
Query disambiguation
User
Interface
Query footprint
Broker
Search Request
+ Query footprint
Ranked
Results
Web
Textual
Resources
Spatial
Place
Ontology
Text
Indexing
Search
Engine
Unranked
Results
Ranked
Results
Relevance
Ranking
Indexes
Spatial
Textual
Geo- Document
Textual
Footprints
tagging Edinburgh
Spatial
Euro GeoInf 2007
22
Place Ontology
Encodes knowledge of terminology and structure of
geographic space
•
•
•
•
alternative names, languages
place types (political, topographic, social.. )
footprint (point, MBR, polygon)
spatial relationships and attributes :
containment, adjacency, overlap
• imprecise (vernacular) places
(“Midlands”, “south of France”, “Scottish borders”,
“Pennines”, “Highlands”…..)
Derive from gazetteers, thesauri, maps & the web
Edinburgh Euro GeoInf 2007
23
Roles of Place Ontology
User
Interface
Metadata
Extraction
document footprints
Geo-Tagging
Query Disambiguation
ontology
Web
collection
document
footprints
Spatial
Index
Relevance Ranking
Relevance
Ranking
Query Expansion
(query footprint)
Edinburgh Euro GeoInf 2007
Search
Component
24
Mining text on the web for
vernacular place name knowledge
• Objective: estimate
spatial extent of vague
place
• Documents that refer to
vague places may also
refer to more precise
places inside them.
• Places that occur
frequently in association
with a target named
place may have higher
chance of being inside
• Analyse frequency of
occurrence of colocated places
Edinburgh Euro GeoInf 2007
25
Places mentioned in documents retrieved
by queries on the “Cotswolds”
Edinburgh Euro GeoInf 2007
Figure from Ross26
Purves
et al University of Zurich
GIR and GIS
• GIR currently dominated by web search
– Unstructured results in multiple documents
• Sometimes single focused result wanted
• Hotels within 1 kilometre of the British
Museum in London
• Where are pre-sixteenth century dwellings
in USA?
• Which areas of East Anglia would be
flooded if seaEdinburgh
levelEurorose
by 1 metre?
GeoInf 2007
27
Bringing GIR and GIS together
Geo-knowledge
Geoknowledge
World
Knowledge
GIS
World
Knowledge
GIR
The Web
Edinburgh Euro GeoInf 2007
GIS
GIR
The Web
28
GeoInformation Services
Encode Geo-information in Web Services
(Geo-services)
• Parse natural language queries
• Interpret geo-terminology of queries
• Identify the relevant geo-services to
match geo and non-geo concepts
• Compose appropriate chain of services
Edinburgh Euro GeoInf 2007
29
EU - TRIPOD Project
•
•
•
•
•
•
•
•
•
Improve accessibility of images on web
Focus on geographical context
Enhance captions / metadata for archival images
Automatically generate captions for images from
location / orientation – aware cameras
Web harvesting to enrich metadata
Interpret (vague) spatial natural language
Toponym ontology of places and landmarks
(including vernacular places)
Use 3D landscape models to determine what is in
camera view
Prototype image search
engine
Edinburgh Euro GeoInf 2007
30
http://tripod.shef.ac.uk/index.html
Future of GIR?
• Improve “conventional GIR” components:
– Geo-tagging, spatio-textual indexing and
geo-relevance ranking
• Place ontologies with world-wide coverage
• Understanding of spatial natural language
• Integrate time & space (temporal language)
• Open GeoInformation Web services
• Adapt GIR to personal needs & location
Edinburgh Euro GeoInf 2007
31
More Information
• See www.geo-spirit.org for information
on SPIRIT project and downloads of
articles and project deliverables.
[N.B. Prototype search engine (with link
from SPIRIT web site) is no longer
functional]
TRIPOD : www.ProjectTripod.org
Edinburgh Euro GeoInf 2007
32
Download