Scientific Web understanding and measurement

advertisement
Web Science PhD call 2014
Scientific Web understanding and measurement
To date, the MOD has funded many PhDs across a number of themes. The focus of the
current call for PhD level research, commencing in January 2015, is Scientific Web
understanding and measurement. Our intention is to fund three, interacting PhDs. We
encourage applications from UK institutions with leading academics, research groups or
research centres in the subject areas identified below. The programme seeks to fully fund
PhD projects (a maximum of two per organisation) where applicants can demonstrate how
they are leading the thinking in the areas of interest. Evidence of the international standing of
the research of the academic groups identified in the proposal should be provided which
includes evidence of significant research income and their contribution to the UK and
international research landscape. The benefit that MOD would obtain through funding
research at their particular institution should also be described. We also intend to build upon
the investment in world class expertise by the research councils within the specified areas;
significant alignment with, or contribution from, other sources of studentship support from the
host university (e.g. from EPSRC, institutional funds, other funding sources) is highly
desirable. Applicants need to refer to the assessment criteria (defined later in this
document) so that they fully understand how a proposal’s quality, relevance and value to
MOD will be judged.
Key Dates:



Closing date for applications 24 October 2014;
Funding decision by 14 November 2014;
The projects will need to commence by the end of January 2015.
Scientific Web understanding and measurement
The Web has become a pervasive utility in modern life in which emergent technology and
social interactions have shaped its use and value. The Web is an area in which new
technology has led scientific models of understanding and appropriate principles of
measurement. There is a need to strengthen our mathematical and statistical understanding
of the Web to ensure our derived estimates seek precision and reduced bias. Dstl proposes
to fund PhDs in Web Science to help build the necessary skills and knowledge bases to
realise this vision.
We are, therefore, requesting proposals for doctoral level research programmes focusing on
the following areas:
1.
A ‘geology map’ of the Web
An assumption underlying the selection of appropriate scientific method is that the
researcher understands the distribution of their data. In statistics, this motivates an
‘exploratory data analysis’ (EDA) as a first step in modelling and forecasting by
understanding the spread (moments) of the data and their internal associations. In
considering the Web, the assumption is not satisfied. What is needed is a complete
(spanning) and quantitative estimate of the ‘information yield potential’ from compiled
DSTL/PUB84073
categories of Web content. The Web may be considered to be an immense (yet finite)
universe of information in many forms. Within this complexity, it is expected that an
appropriately devised (and evidenced) conceptual categorisation of Web content will identify
sub-populations of information content that offer homogeneous properties.
Such a
categorisation has been previously developed in government statistical offices to structure
diverse economic activity for estimation purposes (such as producer price index (PPI) and
gross domestic product (GDP) estimation). A standard taxonomy (which spans all economic
activity) is the North American Industry Classification System (NAICS) (recognising that
economic activity is only one component of Defence interest). Other taxonomies (e.g.
politics, security, societal, etc.) need to be developed to the same depth and completeness.
What does this offer as Defence benefit? An analogy may be drawn from mineral
exploration. If we are interested in mining for diamonds, we look at geology surveys (of
surface and sub-surface) rock types that have a propensity to be associated with the
discovery of diamonds and focus our drilling activities there. By mapping out Web taxonomy
categories (rock types) and quantifying their information yield potential (strength of
association of minerals with bearing rock types) we provide the necessary understanding of
what sub-populations of Web content jointly offer the richest yield potential for a given
question. Different questions will map onto different Web sub-populations and the dynamic
nature of Web content will require a cost (and processing) efficient method of maintaining
those ‘yield’ estimates as the Web evolves. The output of this research provides the
necessary and sufficient foundation for the next PhD studentship; a sampling theory for the
Web.
Subjects of interest include:




2.
Novel mapping of the Web landscape and its conceptual populations of content;
Methods and measures to estimate Web information content homogeneity;
Methods and measures to estimate ‘information yield potential’ against questions;
Evidenced methods to efficiently evolve the mapping of the Web landscape model.
A sampling theory for the Web
Statistical sampling methodology has developed to allow the properties of large populations
to be estimated (with minimised error and bias) using a smaller, representative sample
drawn from that population. The motivational reasons for using a small subset may include
collection time, costs or processing constraints and is particularly useful where the
population is known to be dynamic, heterogeneous and incompletely known. Sampling
methodology has been advanced particularly in government statistics related to the
estimation of economic or social activity to inform the shaping of policy in these areas. The
aim of this research activity is to innovate an evidenced sampling methodology for the
detection, estimation, modelling and prediction of world events drawn from Web content.
This work will yield a scientific model of how to configure the collection and automatic
processing of diverse Web information to answer posed questions (such as the opinion of a
nation’s population).
The continued rise and centrality of the Web offers many opportunities to measure the status
and dynamics of societies and events across the world. We may consider the information
pages on the Web to be a very large population (comprising of many, dynamic,
DSTL/PUB84073
heterogeneous and incompletely mapped sub-populations) from which estimates may be
made. However, the volume, diversity and contradictory nature of information offers new
theoretical challenges. Novel statistical methodology is required to intelligently sample the
Web for information, to ensure that measurements and derived statistics are founded on
science rather than selection bias and intuition.
This research activity is potentially disruptive in seeking to design new statistical
methodology that scientifically handles the multimedia sampling of information of the Web,
through understanding the propensity for the information sub-populations on the Web to
yield relevant information to the questions posed.
Subjects of interest include:



3.
Novel statistical sampling theory based on evidence of Web content understanding;
Methods of sampling that handle numerical and categorical information;
Efficient and appropriate sampling of very large and dynamic graphs.
Tracking in information space
Tracking in physical space is a mature and numerical research domain. Residual research in
the tracking domain is often directed to tracking targets through extreme manoeuvres and
novel configurations of radar sensors (e.g. bi-static). By carrying the tracking model into the
Web, we may ask how we might conceptually track the evolution of national activity (such as
economic sector industries) through the landscape of the Web and its associated databases.
In the Web landscape we have (perhaps) the usual geospatial and temporal parameters
associated with Web content. However, time parameters can now have many flavours (e.g.
time of the actual event, the time of its reporting, the time of its publication, the time of
retrieval, etc.). Additionally, geospatial information may have many flavours of precision (coordinates, place names, regions, nations) and carry ambiguity in spelling and language. In
addition to the traditional space and time, we now have the informational dimensions to
process. These numeric and categorical dimensions may include topic, semantics and
quality and may require a hybrid arithmetic to operate upon these mixed variable states.
This research is seeking to formulate the mixed numerical-categorical states associated with
sector industries, the graph associations between states and their sequential association
(tracking) through time drawing on the landscape of Web content. While some tracking
concepts may be drawn from existing, physical tracking techniques (e.g. time latent MultiHypothesis Tracking) this call seeks new directions of research that will assist in
understanding the dynamics of industrial sector capability and strategy across both individual
and multiple nations.
Subjects of interest include:




Novel factor formulation of mixed-variable states to represent capability and strategy;
Methods of evolving those states in time, drawing from Web content;
Methods and measures to create a graph of states and estimate its dynamics;
Visualisation of the graph evolution and anomalies therein.
DSTL/PUB84073
Assessment criteria
PhD proposals will be reviewed under the following assessment criteria and all applications
must provide the necessary information requested in the application form.
Assessment criteria used to judge the proposal
All applications will be judged for technical relevance and quality (under the criteria shown
the following table) prior to being considered further according to the academic/research
groups or research centre and linkages criteria.
Assessment Area
Assessment criteria used to judge the proposal
The proposal will be judged on the following:
Scientific Quality and Innovation

The novelty of the proposed work in relation to the
context, and the timeliness.

Whether the proposed work is ambitious,
adventurous, and transformative.

The pathway to impact for the proposed research.

How complete and realistic the proposed approach
is.
The proposal will be judged on the following:
Academic Staff, Resources and
Management

The CV(s).

Whether the team’s expertise aligns with the topic
of the call.

The balance of skills of the team.

The time and commitment proposed.

If requirements for government furnished
equipment or information (GFE, GFI) is realistic
and whether any work involving human
participation is being reasonably proposed.
DSTL/PUB84073
Assessment criteria used to judge the academic/research groups or research centre and the
value to Dstl.
Only technically strong proposals will be considered for funding. The academic/research
groups or research centre and linkages criteria will be used to further assess the quality of
the application(s). The benefit of funding multiple proposals at a research group/centre and
the contributions offered outside the Dstl funding will be judged for single and multiple
applications from each group/centre.
Assessment Area
Assessment criteria used to judge the proposal
The proposal will be judged on the following:
Academic/Research Groups or
Research Centre

The evidence provided of the international standing
of the research of the group or centre, including
evidence of significant research income and their
contribution to the UK and international research
landscape.

The benefit MOD would obtain through funding
research at the particular institution.

The relevance of the broader research in the centre
to MOD.
The proposal will be judged on the following:
Linkages

The benefits of funding multiple projects (a
maximum of 2 per organisation) at the particular
group or centre.

The benefits associated with any wider linkages.

The value of linkages to Dstl. Applicants are
encouraged to provide options that include
significant alignment/contribution of other
studentship support from the host university (e.g.
from EPSRC, institutional funds, other funding
sources).
Further Information and the process
In addition to the PhD proposal(s) submitted by the research group/centre, the applicant
must provide details of how the group/centre can contribute to leading the thinking on the
specific theme(s) proposed and how further engagement can be fostered between the
research group/centre and the MOD.
The intention is to fully fund three PhDs in 2014. The deadline for applications following the
conference is the 24th October 2014. Successful applicants will be informed by the 14th
November 2014. The projects will need to commence by the end of January 2015. Further
terms and conditions will be made available, on request.
DSTL/PUB84073
DSTL/PUB84073
Download