Most of the European countries have at least a linguistic atlas of

advertisement
HEAP, David, page 10 of 20
Online Dynamic Iberian Linguistic Atlas
(Attachment # 1: summary)
This collaborative project brings together a team of scholars from three continents to transform a unique
legacy of linguistic data into a cutting-edge research tool that is poised to become a standard point of
reference for researchers in Iberian dialectology and language cartography for generations to come.
The University of Western Ontario is currently home to a resource which is like no other in the
world: copies of the dialect survey notebooks from the Linguistic Atlas of the Iberian Peninsula (the
Atlas Lingüístico de la Península Ibérica or ALPI). The original ALPI fieldwork transcriptions, made in
the 1930s but largely ignored in Spain until quite recently, have been gathered here at UWO’s
Theoretical and Applied Linguistic Laboratory, where we have the only complete collection of field data
from this project (the holdings in Spain are divided between three different locations, not all of which are
accessible to scholars). Since 2001 we have been electronically publishing the raw data from this longawaited and unique resource for Iberian Romance dialectology, which provides researchers with an
invaluable snapshot of how Spanish, Portuguese and Catalan were spoken in the Iberian Peninsula at a
specific historical moment. This project has already attracted a great deal of scholarly interest from the
international scientific community: our site www.alpi.ca receives some 150 unique visitors daily and has
hundreds of registered users who access thousands of pages of data from dozens of different countries.
In its current form, however, the ALPI database provides only raw data: scanned facsimiles of the
original fieldworker's notebook pages, a format which only begins to tap the real potential of the ALPI
data. To go beyond the static hand-transcribed data of the original notebooks to online databases where
linguistic forms can be searched electronically and used to create maps automatically according to
individual researcher's interests and needs, we need to collaborate with international teams of scholars.
The raw data can only be read rather painstakingly by specialists trained in the particularly
detailed phonetic alphabet used in the ALPI fieldwork. In order to make the data more widely useable,
the linguistic forms have to be retranscribed digitally and coded into databases which can then be
accessed by a broad range of researchers, students and eventually the general public. This massive data
transfer (over 36 000 pages of hand-transcribed field notes) cannot be done automatically but requires
coordinated long-term efforts by research teams. At UWO we are developing the tools for web-based
retranscription of the ALPI phonetic alphabet as well as discrete coding tools in order to tag grammatical
and lexical variants.
To test and refine these different web-based tools, we need to work closely with specialists in the
different Iberian dialects represented in the ALPI data. The highly qualified personnel needed for this
work are not to be found in sufficient numbers in any one centre: instead, the required linguistic
expertise is spread through a number of different universities. As scholars in Spain and elsewhere realize
the potential goldmine of linguistic data which the ALPI represents, existing research teams in
dialectology can make available their specialized knowledge to help transfer the original data for their
respective regions into digital format. As part of a network of scholars working on this project their own
ongoing research will in turn benefit from improved access and readability of the ALPI data.
The Principal Investigator at UWO (Heap) will facilitate and coordinate the contributions of two
research teams in Spain (in Barcelona and Madrid) which are already leaders in the electronic
publication of language variation data. The team will also build towards future collaborations with other
regions in the Iberian Peninsula where there is growing interest in using ALPI data to publish regional
linguistic atlases. Our co-applicant at l’Université de Montreal, Enrique Pato, also brings his experience
in creating ALPI databases and in online cartography. Since we already have the required expertise in
data-coding and relational database infrastructure, these two researchers at Canadian institutions are
well-positioned to take a leadership role in developing the network of researchers and the web-based
tools needed for data entry and data management in the next stages of the ALPI scholarship. The
Canadian team also forms a link between Spanish colleagues working with us on the ALPI databases and
the expertise in automatic cartography from the VARILEX project based at Sophia University (Tokyo),
which will be instrumental in creating dynamic linguistic maps on the internet from our data.
2. Online Dynamic Iberian Linguistic Atlas
HEAP, David, page 11 of 20
(Attachment # 2: detailed description)
UWO is already home to an unparalleled data resource for scholarship in Hispanic language variation
which provides a unique opportunity for international collaboration to create a new interactive
linguistic atlas online.
Background: from dialect survey to internet database
Linguistic atlases are a standard research tool for studying language variation, and most European
languages have at least one which covers their entire national territory. Spain and Portugal are, however,
exceptions in that no complete and uniform survey of data from all the Iberian Romance languages and
dialects has ever been published. The Linguistic Atlas of the Iberian Peninsula (Atlas Lingüístico de la
Península Ibérica or ALPI) was proposed by philologist Ramón Menéndez Pidal in the early twentieth
century and the dialect surveys were carried out under the direction of phonetician Tomás Navarro
Tomás beginning in the 1930s. Just when most of the surveys were finished, the Spanish Civil War cut
the project short and the ALPI materials, along with Navarro Tomás, went into exile in the U.S. In the
1950s the materials returned to Madrid and the surveys were completed, but only a single volume was
ever published (ALPI, I. Madrid: CSIC, 1962), out of what would have been at least ten volumes if the
publication had been completed. Most of this invaluable data remained unpublished and neglected for
decades until uncovered by a Canadian scholar in 2001 (cf. Heap 2002).
The ALPI data cover a network of 527 survey points from across the Iberian Peninsula with two
field notebooks at each point (Notebook I Phonetics and Grammar, Notebook II Vocabulary). This
unique collection of transcribed linguistic forms, with precious data on how Spanish, Catalan and
Portuguese dialects were spoken at a specific historical moment, has no parallel in the world of Iberian
dialectology; the ALPI remains the only complete language survey of this linguistic area. The most
complete collection of these materials in existence today consists of the copied notebooks held at the
University of Western Ontario’s Theoretical and Applied Linguistic Laboratory (TALL).
While the original field notebooks in Spain are housed in three different archival locations, and
not all of them are accessible to researchers, the ALPI data collection at Western began appearing as an
Internet publication in 2002, in the form of scanned notebook images. Funded by SSHRC since 2003, the
site www.alpi.ca has already made more than 70% of the original ALPI data available to the international
research community. The site now averages more than 150 visitors per day, up from just 87 visitors daily
during the preceding 12 months; the site’s total bandwidth for last year was over 20Gb, more than four
times the figure for the preceding year. More significantly, this internet traffic comes from more than 30
different countries: mostly Canada, U.S. and Spain, but also a variety of other Spanish-speaking
countries, many different parts of Europe and east Asia, in particular Japan, evidence of the increasing
international scholarly interest in the ALPI data. The online ALPI database has several hundred
registered users, who continue to download thousands of pages of data for their research. Thanks to the
internet, our relational databases and web interface will soon make all of the original survey data (some
36 000 pages in all) freely available, thus avoiding the practical limitations (i.e. production costs) that
prevented the print publication of the ALPI from being completed decades ago. Despite this success,
there are still serious limitations to how these data can be accessed and analysed in their current form
(i.e. the current database can neither be searched nor mapped automatically), limitations which cannot be
overcome by a single research team working at one institution.
Concrete objectives
This application seeks support for the initial phase of the Online Dynamic Iberian Linguistic Atlas
project (henceforth ODILA), which will improve and expand the online delivery of ALPI data by
building an international team of collaborators working in electronic dialectology. We will provide the
scholarly community with a representative portion of the ALPI data from two language areas (Catalan
HEAP, David, page 12 of 20
and Castilian) in digitized format (a substantial achievement in itself), as well as establishing solid
foundations for future work on the ALPI data, by setting up tools, protocols and guidelines for the
coordination of research teams at different centres. The web tools and common resources which we
perfect and implement will then serve as ‘proof of concept’ for demonstrating the possibilities of
dialectology research on the internet, and recruiting further research teams to participate in future
ODILA collaborations.
The specific and feasible outcomes of this project include:
 a representative sample (from at least 50 data points or about 10% of the whole ALPI survey) coded
for variables selected from different sections of the ALPI questionnaire (phonetics, grammar,
vocabulary);
 development and testing of web-tools and working protocols corresponding to each of these
sections, including discrete coding tools for grammatical and lexical variables, as well as phonetic
transcription routines using Unicode character sets (where appropriate, orthographic i.e. spelling
transcriptions will also be used for lexical variables).
 recruitment of at least two more regional collaborators in other regions of the Iberian Peninsula,
with resources and expertise to continue work on the ODILA in their respective areas.
In this way, a large-scale scientific undertaking can be reduced to a series of manageable subprojects,
each of which can be addressed and completed individually in an efficient and focussed manner.
Proposed research: Interactive databases for online cartography
While the internet publication of the raw ALPI field materials has met a clear demand on the part of
researchers (witness the number of visits to the www.alpi.ca site as well as the number of pages accessed
and downloaded), in their current form the scanned original notebook pages still cannot exploit the full
potential of these invaluable data. The notebook images can be accessed using a list-driven or mapdriven search, but once located each page must be read individually to find the linguistic forms relevant
for a given area of research. Furthermore, these pages of raw data can only be read rather painstakingly
by specialists trained in the particularly detailed phonetic alphabet used in the original fieldwork
transcriptions; the ALPI’s original director (Navarro Tomás) chose not to use the International Phonetic
Alphabet but rather his own adaptation of a traditional dialectologists’ phonetic alphabet favoured in
Spain, to which he added a large number of extra symbols (diacritics) to represent fine phonetic nuances.
The result is that the phonetic transcriptions are so detailed that the data are difficult for some scholars
and many students of linguistics to access (we have already determined experimentally that optical
character recognition software is not a viable option for accessing these data, given the type of phonetic
transcription used, the different fieldworkers’ individual handwriting styles and the number of additional
diacritic symbols employed in the notebook transcriptions).
In order to make the ALPI data more widely accessible, the linguistic forms have to be
retranscribed digitally and coded into databases in a form which can be accessed and used by a broad
range of researchers, students and eventually the general public. This ambitious project involves the
transfer of information from handwritten phonetic transcriptions in the original ALPI notebooks to
specially-designed relational databases accessible online. Such a massive data transfer (over 36 000
pages of hand-transcribed field notes) cannot be accomplished at any one institution but rather requires
coordinated long-term efforts by coordinated teams of researchers. International collaboration is the only
conceivable way a project of this scale can be conducted: the size of the task necessarily entails a multicentre approach which brings together the specific regional expertise of different teams in a coordinated
web-based network of researchers.
At the UWO, we have been developing web-based tools for retranscribing the ALPI phonetic
alphabet using Unicode character sets (the modern standard for internet usage) to replace the hand-
HEAP, David, page 13 of 20
transcribed phonetic notation system. By simplifying the Navarro alphabet to a semi-phonemic inventory
of symbols, we will create a transcription system which is automatically translatable from the traditional
dialectologists’ phonetic alphabet to the International Phonetic Alphabet, thus increasing the range of
users who can access the ALPI data. In addition, some forms (for example, lexical items) will also be
transcribed orthographically (i.e. in everyday spelling) where possible, making this part of the ALPI data
accessible not only to specialists but also to students and to the general public. The discrete tagging and
coding tools we are developing in order to code grammatical and lexical variants in our database also
produce data in a format which can be accessed without knowledge of phonetic transcription systems.
Crucially, none of the original transcriptions will be lost: all of the scanned images with the detailed
phonetic notations will remain online for researchers to access should they wish to consult the original
data directly. Since the searchable database with simplified Unicode transcriptions will be linked to the
original scanned images, the online digital atlas will facilitate locating specific pages of the original
ALPI notebook data, for those who want to see the full details at a given survey point.
The specific online tools we envisage involve web-forms that display a given scanned page of
data (i.e. a certain part of the ALPI questionnaire for a particular survey point) and present each dataenterer with various options: providing a phonetic transcription using the new Unicode character-set,
providing an orthographic transcription where appropriate, and choosing between different coding option
for grammatical or lexical items. Depending on each team’s focus at a given time, the data-enterer may
choose one or all of these options, while also having the choice of providing a meta-comment or other
flag for their team coordinator: all of these entries populate different but linked tables within our
relational database. In addition, each data entry must be time- and date-stamped for a particular loggedon individual, allow us to track trends and maintain data integrity across and between group members.
Team coordinators (initially Heap and Pato, although as the other research groups become more involved
our co-investigators will take on this role as well) are responsible for setting access permissions for each
data-enterer, i.e. which data-points they can ‘see’ and edit, checking individual data-entries, and
establishing the discrete choices between grammatical and lexical variants. Since we cannot know in
advance all of the variants which may come up, the team coordinators must also be able to create and
modify the table entries corresponding to the linguistic variants the data-enterers choose. The whole
process is one which requires thorough checking and testing at every stage, as the type of raw data found
in the ALPI notebooks will always present us with surprises and new empirical challenges which have to
be dealt with in principled and consistent ways in order to create data-tables which can then be searched
and used by researchers.
This development process, already begun by a team at Western as part of the post-doctoral
research conducted by Pato under Heap’s supervision, will continue to its natural fruition under the
research collaboration proposed here. Although this Canadian team is in the forefront of developing the
internet tools and procedures for this project, we have reached a stage where international collaboration
is essential for our work to progress further in a productive manner. In order to test and perfect these
different web-based tools, we need to work closely with researchers experienced in the different Iberian
dialect areas represented in the ALPI data. The highly qualified personnel needed for this work are not to
be found in sufficient numbers in any one centre: instead, the relevant linguistic expertise is spread
though a number of different universities. As scholars in Spain and elsewhere start to realize the
potential goldmine of linguistic data which the ALPI represents, existing research teams in dialectology
can contribute their specialized knowledge to help transfer the original data for their respective regions
into digital formats, as part of a network of scholars whose ongoing research can benefit from improved
access and readability of the ALPI data.
The overall architecture of our relational databases allows us to manage and coordinate the
different yet linked data-sets from each of the regional projects and, crucially, to maintain data-integrity
across multiple research sites through our server at the UWO. Our initial aims are moderate and
achievable: the period covered by the support sought in this application envisages working with a
HEAP, David, page 14 of 20
representative subset of data from just two of the major language regions of the Iberian Peninsula,
Castilian and Catalan. The role of the PI at UWO (Heap) is to facilitate and coordinate the contributions
of two research teams in Spain which are already leaders in the electronic publication of language
variation data (Perea in Barcelona and Fernández-Ordóñez in Madrid) while building towards future
collaborations with other regions in the Iberian Peninsula where there is now a growing interest in using
ALPI data to publish regional linguistic atlas projects. Successfully coordinated work in these two areas
will allow us to work towards further collaboration with teams of scholars in other regions (Andalusia,
Asturias, Galicia and Portugal, for example). Since there have already been overtures from scholars in
each of these regions who are interested in working with the ALPI data, the support of SSHRC’s
International Opportunities Fund at this point will allow us to move quickly to establish collaborations in
regions covering more of the Iberian Peninsula in the near future, and to obtain the necessary resource
commitments from their respective regional governments.
Some of the early reviews of and studies based on the one published volume of the ALPI (1962),
while positive about the importance and usefulness of the data (Catalan 1964: 307, 1975:97), stress that
the same data could just as conveniently be presented as lists of forms without mapping each variable, as
was done for example with the Survey of English Dialects (Orton et al. 1962-1971). While those
observations predate the use of databases in dialect research, they clearly foreshadow current and future
trends in this field: the future of dialectology lies not in print-based paper atlases but rather in open
searchable databases where individual forms and classes of forms can be searched according to
researchers’ interests and needs (Kretzschmar 1999), without reading them from predetermined maps.
Of course, the traditional view of a linguistic atlas necessarily involves language data presented
on maps: as useful as a searchable database might be, our project would be incomplete if we did not also
provide a means to display the ALPI data in a way that shows spatial relations between different
linguistic forms and (physical or human) geographic features. The beauty of a linguistic atlas in database
format is that the results of a search can be either displayed as custom maps or analysed statistically and
presented in tables and other formats. In other words, rather than being stored as static maps in hard-toconsult (and costly to publish) printed volumes, linguistic data retrieved from such a database can be
used to generate dynamic maps “on the fly” (Ruiz Tinoco 2002:6), or exported in tabular format, or both.
Within the field of Spanish language variation, there already exists a successful prototype of this
approach: the VARILEX project in Tokyo, which has studied lexical variation from more than 40 cities
across the Spanish-speaking world since 1993. Since 1999, the VARILEX data have been published
online (see http://gamp.c.u-tokyo.ac.jp/~ueda/varilex/index.html), and these results can now also be
mapped on demand (via http://lingua.cc.sophia.ac.jp/varilex/php-atlas/lista3.php). The proposed ODILA
research team includes Ruiz Tinoco from Sophia University, who brings his expertise in the design and
implementation of PHP-driven web interfaces used by the VARILEX project to this project.
Coordinated collaboration via internet databases will not only facilitate researchers from different
institutions working together, it also has the advantage of allowing cumulative and progressive
contributions to the ongoing project. Data-entry in any format necessarily entails some degree of human
error, but working with databases allows us to make corrections more easily: instead of having to wait
for each printed volume before discovering the (inevitable) errata if we were producing traditional ‘hardcopy’ linguistic atlas, our relational databases can be updated as errors are detected, and datamanagement protocols (authentication user log-ins with specific localized authorizations, all entries
time- and date-stamped) allow us to track and correct systematic errors or even divergences in data-entry
choices, if and when they should occur. As a result, the data which end-users can access via the internet
will potentially be more coherent and reliable than any single printing of an atlas could ever hope to be.
Methodology and timelines
An important goal of the ODILA is to create methods which allow researchers to work as much as
possible in a decentralized manner, with investigators each leading their own team at their home
HEAP, David, page 15 of 20
institution, and the regional teams working together via a centralized database on the ALPI server
(www.alpi.ca) at the University of Western Ontario. The PI (Heap) will coordinate the different team
efforts, assign the raw materials to each team and manage the databases: fortunately, he can draw on the
infrastructure resources and technical support of the CFI-funded Theoretical and Applied Linguistics Lab
at UWO for this project. The database design and webforms we are developing will continue to be based
on open-source software (SQL for the databases and PHP for the interactive webforms): this decision not
only helps in keeping costs under control by avoiding expensive updates, it ensures that the tools we
develop can be freely adapted by other researchers.
While most of the testing of prototype transcription and coding tools will naturally occur online,
to ensure an efficient and productive start to this collaborative work, initial stages will require more
direct personal contact than it is possible to achieve via internet. The most reliable way to ensure regular
and accurate feedback in the development and refinement of the common toolset and procedural
protocols (including web interfaces for data entry) is through in-person meetings and intensive worksessions with co-investigators in each of the target regions. Heap will be on sabbatical leave from the
UWO, devoting himself full-time to research, from January to June 2007: he will be based at the
Universitat de Barcelona where he will work closely with Perea’s research team. He will also consult
regularly with Fernández-Ordóñez’ research team in Madrid to ensure that the phonetic transcription and
grammatical coding tools developed in Barcelona for Catalan-speaking areas are equally applicable in
Castilian-speaking areas. Heap’s expertise in the electronic publishing of dialect data and his intimate
knowledge of the ALPI data will be complemented by Perea’s experience in computerised dialectology
and dialectometry (Viaplana & Perea 2003; Perea 2004) and by Fernández-Ordóñez’ track-record in
internet delivery of language variation data (Fernánez-Ordóñez 2004; Fernánez-Ordóñez & Pato 2005).
Heap will also maintain regular contact with Pato, whose reduced course-load (see accompanying letter
from the Université de Montréal) will allow him to devote substantial time to testing the transcription
and coding tools as well as involving students at Université de Montréal in data-entry procedures. As
enough verified data become available in our database, we will be able to implement PHP webforms for
mapping, based on Ruiz Tinoco’s VARILEX model. The necessary web-formatted maps, with tabular
coordinates for all the ALPI points, have already been prepared for this stage of the project.
The methods and protocols developed in these consultations will then serve as demonstration
models in order to recruit future ODILA collaborators. By visiting similar teams in different regions of
the Iberian Peninsula, Heap will be able to show the advantages of working on such a coordinated
project and recruit other researchers to collaborate with further stages of the ODILA: having established
methodologies and preliminary results, we will be positioned to attract further institutional and regional
government support for each team. Canadian team members (Heap from UWO and Pato from Université
de Montréal) will meet in person at the beginning of the project (November-December 2006) to discuss
the implementation and refinement of work protocols, and will meet again following Heap’s return from
Spain (July 2007) as well as remaining in regular e-mail contact during the intervening period. Heap and
Perea will attend an international symposium on linguistic cartography (at the invitation of the National
Institute of the Japanese Language in Tokyo) in August 2007, which will be preceded by consultations
with Pato and Ruiz Tinoco on the PHP webforms for mapping from the database. By the fall of 2007
working versions of both the preliminary databases and the dynamic map-generation interface will be
available online to the scholarly community. As PI Heap will not only coordinate the work of the
different team members, but also check to ensure the uniformity of the phonetic transcription and coding
criteria as well as maintaining consistent use of the web-forms for entering the data.
OUTCOMES
Once completed, this first stage of ODILA will give the international scientific community access to a
substantial sub-sample of the ALPI data, a unique resource for Spanish and Catalan dialectology. Unlike
the static maps of the traditional printed linguistic atlas, our dynamic internet atlas will facilitate the
HEAP, David, page 16 of 20
exploration of hypotheses not even anticipated by the original fieldworkers nor those who are creating
the data base. The data will be delivered to the scientific community in less time than by a paper
publication, and will be easier to edit and to rearrange from different points of view for different
purposes. The ODILA will thus represent not only a new way to disseminate research results, but also a
new way to collaborate as a team in linguistic research:
The key feature of the Web site is that it is an interactive resource. It is abundantly cross-linked
in addition to allowing the user to ask several different kinds of questions of the database. When
we have more data, it will be possible to ask questions across several different projects at once....
The Web is the research the tool of the future, and we have it now. (Kretzschmar 1999: 283)
Such a vision, of course, requires a data structure built on relational databases in which the information
is distributed in different interconnected tables, with user-authentication and data-checks between tables.
The dissemination of research results over the Internet has another characteristic that
distinguishes it from traditional printed linguistic atlases: its easy access, not only for the scientific
community but also for the public in general: “we need to accept as central to our purpose the goal of
informing the public, not just the scholarly community, about the facts of language variation, especially
as that information can affect education and public policy” (Kretzschmar 1999: 283) .The lay public’s
natural interest in spoken language—be it regional speech or the speech of a particular town—has rarely
if ever been satisfied by academic dialectologists, as our work is too often difficult for non-specialists to
access. With a dynamic linguistic atlas online we will take an important step towards the preservation
and dissemination of real data from Iberian Romance varieties, in this way giving to all interested people
an overall view of this fundamental element of their cultural heritage: their ways of speaking.
Results of this research will of course be presented at appropriate scholarly venues, including
conferences such as New Ways of Analysing Variation (annual), Linguistic Symposium on Romance
Languages (annual), the International Congress on Methods in Dialectology (Leeds, 2008) and the
International Society for Dialectology and Geolinguistics, as well journal articles in for example
Language Variation and Change, Revista de filología española or the Revue de linguistique romane. The
ALPI data which will be made available by the ODILA will shed crucial new light on some recurring
issues in Spanish and Catalan linguistics currently of interest to researchers, including variable pronoun
usage, the distribution of different verb forms, and of course patterns of lexical and phonetic variants. In
addition, the ODILA will provide a novel teaching resource: future students and teachers of Hispanic
linguistics or of language variation will be able to access this goldmine of data via the internet, and
extract either tabular data or custom maps which can be easily embedded in their research projects or in
their classroom materials.
In terms of timing, this proposal follows and builds on a series of international scholarly meetings
involving the proposed team members: at the International Conference on Methods in Dialectology XII
(Moncton, August 2005), we established our presence in the field of dialectology with a Workshop on
New Methods in Iberian Dialectology. This was followed up by a Workshop on the Automatic
Processing of Variation in Iberian Languages (Tokyo, July 2006). During the period of this award, the
International Conference of Historical Linguistics (August 2007 in Montréal) occurs at a point in this
project which will allow for Canadian and overseas team members to meet and discuss results with
scholars keenly interested in the data we will be making available. As other regional research groups in
Spain (for example in Santiago de Compostela, Oviedo and Sevilla) move towards regional projects
based on the ALPI data, we can establish Canadian researchers as leaders in the field and place our team
advantageously with respect to these other emerging initiatives.
REFERENCES
HEAP, David, page 17 of 20
(Attachment # 3)
ALPI 1962. Atlas Lingüístico de la Península Ibérica. Madrid: Consejo Superior de Investigaciones
Científicas.
Catalán, D. 1964. El ALPI y la estructuración dialectal de los dominos lingüísticos de la Ibero-romania.
Archiv für das Studium der neuen Sprachen and Literaturen 201, 307-311.
Catalán, D. 1975. De Nájera a Salobreña, notas lingüísticas e historícas sobre un reino en estado latente.
Studia Hispanica in Honorem R. Lapesa, III, Madrid, Seminario Menéndez Pidal y Gredos, 97121.
Fernández-Ordóñez, I & Pato, E. 2005. L’espagnol rural de la Péninsule Ibérique étudié dans une perspective
grammaticale: le nouvel apport du Corpus Oral et Sonore de l’Espagnol rural (COSER). International
Conference on New Methods in Dialectology XII. Moncton, New Brunswick, Canadá.
Fernández-Ordóñez, I. 2004. Nuevas perspectivas en el estudio de la variación dialectal del español: El Corpus
Oral y Sonoro del Español Rural (COSER). XXIV Congrés International de Linguistique et Philologue
Romanes. Aberystwyth, Gales.
Heap, David. 2002. “Segunda noticia histórica del ALPI a los cuarenta años de la publicación de su
primer tomo” Revista de filología española. LXXXII:1-2, 5-19.
Kretzschmar, William. 1999. “The Future of Dialectology.” In Katie Wells & Clive Upton, eds.
Proceedings of the Harold Orton Centennial Conference. Leeds Studies in English XXX, 271287.
Orton, H. et al. 1962-1971. Survey of English Dialects, Leeds, Published for the University of Leeds by
E. J. Arnold. 4 volumes.
Pato, E. 2004. La sustitución de cantara-cantase por cantaría y cantaba (en el castellano septentrional
peninsular). Madrid: Universidad Autónoma de Madrid.
Perea, M.P. 2004. Mapes electrònics i mapes sonors. Dialectologia i recursos informatics. Barcelona:
Promociones y Publicaciones Universitarias. 135-152.
Perea, M.P., & J. Viaplana. 2003. Textos orals dialectals del català sincronitzats. Una selecció. In PPU Universitat de Barcelona. 1-167.
DESCRIPTION OF TEAM
HEAP, David, page 18 of 20
(Attachment # 4: description of team)
All of the researchers in this international team (Heap at Western, Fernández-Ordóñez in Madrid, Pato
in Montréal, Perea in Barcelona, Ruiz Tinoco in Tokyo) have complementary expertise in the electronic
publishing of linguistic data in the field of Iberian Romance language variation. All have participated
successfully in collaborative research teams as well and have international experience.
Principal Investigator: David Heap is the researcher who uncovered the ALPI notebooks in different
archives in Spain after decades of neglect and has been publishing them online (www.alpi.ca). He is
former director of the Theoretical and Applied Linguistics Lab at UWO, and has experience in managing
project grants and teams of research assistants. Apart from his work in linguistic geography, his research
deals primarily with formal approaches to morphosyntactic variation in Romance pronominal paradigms.
He brings to this project an intimate knowledge of the ALPI materials and issues relating to digitizing
these data, as well as an overall vision of the ODILA project’s long-term direction.
Co-Investigators:
Inés Fernández-Ordóñez is a medievalist as well as a dialectologist, and has experience with
publishing texts in both areas, as well as other scholarly works. For more than a decade she has been
conducting sociolinguistic interviews of contemporary rural Spanish, a selection of which are now
available online (Corpus Oral y Sonoro del Español Rural, COSER http://www.uam.es/coser), and she is
among the first scholars to use ALPI data to shed light on contemporary issues in dialect variation. Her
research on Spanish dialects looks at non-standard pronouns systems, from both a synchronic and
diachronic perspective.
Enrique Pato’s doctoral thesis (Pato 2004) was the first modern work to exploit part of the potential of
the ALPI data for linguistic research. He has worked on different research teams in Spain and
Guatemala, and most recently as postdoctoral fellow under Heap’s supervision at UWO’s Theoretical
and Applied Linguistics Lab, where he took a leadership role in organizing a special session on Iberian
Dialectology at the XII International Conference on Methods in Dialectology (Moncton 2005). His
published research on historical and contemporary dialectology deals with non-standard verb forms,
conditional structures, adverbs and other aspects of morphosyntactic variation across Spanish dialects.
Maria Pilar Perea has studied variation in Catalan dialects both from contemporary fieldwork data and
from historical data, using philologists’ fieldwork notes from the 19th and early 20th centuries to create a
database of linguistic forms that can be mapped and displayed automatically. Her experience with largescale data-entry from dialect survey materials as well as electronic linguistic cartography will be
invaluable to this project. She has published widely in the area of morphosyntactic variation in Catalan,
and participates in a research team which conducts surveys of modern Catalan dialects.
Antonio Ruiz Tinoco is general coordinator of the international Varilex project, which gathers lexical
data from more than 50 cities in over 20 countries across the Spanish speaking world, and publishes this
data online. Among other aspects of this project, he is responsible for the database structure and
management (using open-source mySQL software) and the user interface which uses PHP scripts to
produce automatic maps (http://lingua.cc.sophia.ac.jp/varilex/php-atlas/lista3.php). His expertise with
database management and automatic web-based linguistic cartography will be crucial contributions to
this project. In addition to studying lexical and morphosyntactic variation in Spanish, his research deals
with applications of computing to linguistics.
ROLE OF STUDENTS
HEAP, David, page 19 of 20
(Attachment # 5: Role of Students)
Students form an integral part of the collaborative research team proposed here, and their involvement is
crucial to the overall success of the ODILA project. They will be closely involved in many aspects of the
research activities, including:





planning and implementing the relational database structure underlying the storage and delivery
of ALPI data via the internet;
developing and refining the phonetic transcription and morphological tagging systems used to
encode the ALPI data;
entering data from the ALPI notebooks using the web-tools and interface protocols;
helping develop and test the PHP scripts to create automatic maps ‘on the fly’ from the ALPI’s
SQL relational databases;
analyzing and developing presentations based on data from the ODILA project.
The continuous feedback which research assistants provide to the rest of the team regarding their
experiences (difficulties and successes) with the web-tools and data-entry protocols will be particularly
important as we test the system with a view to recruiting more research teams from different regions. By
interacting in this way not only with the Canadian team members but also with our international coinvestigators, students will gain valuable exposure to top scholars in the field from around the world, as
well as useful experience in linguistic research which will be applicable to their own research interests.
Since the PI for this project (Heap) will be on sabbatical leave in Spain for a substantial part of the
period covered by this award, he will depend crucially on graduate student Research Assistants to ensure
that everything is running smoothly on the server at UWO, and to maintain contact with technical
support services at this institution should the need arise.
It is anticipated that two or more students will develop scholarly papers based on their work with the
ODILA project (as has happened in the past with students working on ALPI data), leading to
presentations at refereed international conferences, either alone or as co-authors with the Canadian
researchers (Heap and Pato). The International Conference of Historical Linguistics (August 2007 in
Montréal) is one particularly promising venue for such presentations, given its geographical proximity
and the number of high-calibre scholars from different countries who will be in attendance.
STUDENT APPLICANT POOL
University of Western Ontario: since 2003, the ALPI project has employed more than 15 graduate
students as Research Assistants, and some forty undergraduates through UWO’s Work-Study Program,
making it a major employer of student assistants in our discipline at this institution. UWO’s offerings in
linguistics continue to grow: French has both MA and PhD students specializing in linguistics, the
Spanish program (recently expanded, adding a PhD to the existing MA) attracts an increasing number of
students specifically interesting in linguistics, and there is a proposal to introduce a ‘standalone’ twoyear MA program in linguistics in the near future. With this increasing pool of qualified students
available for assistantships, we anticipate hiring one Ph.D. student and one M.A. specializing in
linguistics, at the standard SSHRC stipend amounts.
Université de Montréal : co-investigator Pato will be just beginning to develop student interest and
expertise in the area of Spanish linguistics at this institution. As he is in the first year of his position
there, it is anticipated that he will be able to hire at most a small number of (undergraduate or graduate)
students on an hourly basis, in order to begin develop a group of potential Research Assistants.
BUDGET JUSTIFICATION
HEAP, David, page 20 of 20
(Attachment # 5: Budget Justification)
Personnel:
Graduate Students: our budget allows for amounts equivalent to one Ph.D. stipend ($15 000) and one
M.A. stipend ($12 000), because it is our hope to recruit graduate students through UWO’s growing
linguistics program to work along side us on this project in a collaborative fashion.
The amounts for undergraduate students are to be paid on an hourly salary, either at UWO or at
Université de Montréal: average hourly wage of $18 (including benefits) for 500 hours, total $9000.
Travel:
Canadian team members:
Travel to Spain for Pato and Heap (two return flights, each $1200), and subsistence for two stays of 15
days each (30 days @ UWO rate of $125 / day), total $6150.
International team members:
Travel to Canada from Spain for Perea and Fernández-Ordóñez (two return flights, each $1200), travel
from Tokyo for Ruiz Tinoco (return flight $2000), subsistence for three stays of 10 days each (30 days
@ UWO rate of $125 / day), total $8150.
Students:
Travel to conference and team meeting in Montréal or other North American conference (2 trips at $500
each) + subsistence for 2 stays of 7 days @ $125 / day), total $2750.
Professional / Technical consulting:
For support in design and implementation of relational databases and web-forms, 100hours @ $52 / hour
(UWO Information Technology Services consulting rate): $5200.
Other supplies and related expenses:
While every effort will be made to communicate via e-mail where possible, considerable postage, courier
and telephone expenses will be incurred during team members’ travel. Modest amounts of computer
consumables (paper) are needed, and at least one network laser printer cartridges will also be required
for this project: total $2000.
Computer equipment:
Very little is required: the Theoretical and Applied Linguistics Lab at UWO, like our co-applicants’
facilities at their institutions, have adequate computer equipment and software for this project. The only
new requirement is a portable (notebook) computer for the PI to use in demonstating the project in
different site visits ($1800) and related new software licenses ($500).
All funds are budgeted for one year only (2006-2007), total: $62550.
Download