Data collection and document generation system for data-oriented approaches

advertisement
Data collection and document generation
system for data-oriented approaches
Yuichi Mori1 , Yoshiro Yamamoto2 and Hiroshi Yadohisa3
1
2
3
Department of Socio-Information, Okayama University of Science, 1-1 Ridai-cho,
Okayama 700-0005, Japan mori@soci.ous.ac.jp
Department of Mathematics, Tokai University, 1117 Kita-Kaname, Hiratsuka
259-1292, Japan yamamoto@sm.u-tokai.ac.jp
Department of Culture and Information Science, Doshisha University, 1-3 Tatara
Miyakodani, Kyoto 610-0394, Japan hyadohis@mail.doshisha.ac.jp
Summary. A web-based statistics system DoSS@d (Data-oriented Statistical System, http://mo161.soci.ous.ac.jp/@d/) has been developed mainly for educational use, which archives a large number of data sets and the corresponding analysis stories (analysis reports of the actual processes) and provides an online analysis
system. Now we have just implemented an online data collection system with a real
time analysis function and an online document generation system with a database
registration function to DoSS@d . The data collection system has been developed
for not only desktop computers but also mobile information equipments including
portable phone systems which can access to the Internet. The document generation
system provides a useful user interface to generate an XML document of the collected
data and to register it to the database in DoSS@d . These implemented systems can
not only reinforce the data collection function so that users can easily collect data
sets and temporarily perform the simple analysis anywhere and anytime, but also
can make DoSS@d more accessible and useful.
Key words: web-based application, databank of data sets, mobile statistics, online
analysis, XML document
1 Introduction
Many teaching strategies have been proposed based on educational trials, and many
data sets suitable for educational use have been published on the Internet. However,
it remains difficult for users to find data sets suitable for their intended purpose
and obtain documents that describe how to analyze the data (we call this kind
of documentation “analysis story”). The compilation of good examples is therefore
considered important for learning processes and procedures relevant to past analyses. The archiving of example data sets and the corresponding analysis stories is
also expected to facilitate in the learning of statistical packages in a computational
environment, where users can follow the steps of the example computation as an
instructional tool.
1634
Yuichi Mori, Yoshiro Yamamoto and Hiroshi Yadohisa
Recognizing the potential of archiving such examples, the authors began development of a kind of databank on the Internet. This databank represents an online
database of data sets and analysis stories and also incorporates an online analysis
system that performs automatic analysis based on the analysis story (i.e., using the
same parameters as ones in the original analysis). This environment has been named
the “Data oriented Statistical System” or DoSS@d [MYY03, HMYY04, MHYY05],
where “@d” reinforces that the system is used for real data. Currently more than
200 examples classified by research subject and statistical method are stored in the
database. When utilized for statistical education, teaching scenarios can be developed easily, giving the students the chance to learn various statistical techniques
using real data sets as well as to master statistical software using the online analysis function. Students can also perform their own analysis to confirm the results of
the analysis story through the use of simple operations, and can easily examine the
effect of using different parameters. Furthermore, statisticians or researchers who
wish to evaluate their analysis can use the system to find suitable data sets for their
evaluation.
It is easy to understand that more than enough number of good data sets and
analysis stories are essential in consideration of the main purpose of the system. So
we should try to gather good data sets and analysis stories as many as possible.
Now we have just implemented an online data collection system with a real time
analysis function and an online document generation system with a database registration function to DoSS@d . The data collection system has been developed for not
only desktop computers but also mobile information equipments including portable
phones which can connect to the Internet via wireless communication system, so
that we can collect data and perform simple analysis anywhere and anytime. The
document generation system provides a useful user interface to make an XML document describing the collected data in the DoSS@d format and to register it to the
database in DoSS@d without difficulties. Such new systems make DoSS@d more
useful and effective and allow users to collect and upload data rapidly and easily.
This paper presents an outline of DoSS@d firstly and then the details of prototype of the online data collection and document generation systems implemented in
DoSS@d .
2 DoSS@d
The whole structure of DoSS@d is indicated in Fig. 1. The system is located
at http://mo161.soci.ous.ac.jp/@d/ and orignally consists of three subsystems,
DoDStat@d (data and analysis story database), DoAStat@d (online analysis system) and DoLStat@d (learning courses), which are colored with dark gray in Fig. 1.
2.1 DoDStat@d (Data-oriented Database of Statistics)
DoDStat@d is the database system of DoSS@d , which is the core of whole system. Each stored data set consists of a data description and the data body. The
former is written in XML and describes attributes of data such as data name, description, source/reference, research subject, statistical method, cases, variables and
executable analysis, as displayed in Fig. 3. The latter is provided as tab-, comma-
Data collection and document generation system
1635
Fig. 1. Stucture of DoSS@d
and space-delimited values, and can be displayed and downloaded through an ordinary browser by clicking a link in “Data set” area in the data description page (see
Fig. 3). DoDStat@d also stores analysis stories written in XML. The user is able
to select an interesting or appropriate data set using a retrieval key such as research
subject and statistical method listed in Table 2 as well as plain text keyword (Fig.
3 is one of 41 data sets found by searching with a key “Economics”).
2.2 DoAStat@d (Data-oriented Analysis System of Statistics)
DoAStat@d is a web-based application for the analysis of any data set stored in
DoDStat@d as well as in the local computer. In addition to allowing users to analyze
data sets, DoAStat@d also provides a function that allows users to easily obtain the
same results as described in the analysis story of the data by automatically importing
the parameters stored in the XML document of the analysis story. Two versions of
DoAStat@d have been implemented: a CGI version using R Server (DoA R) and a
Java version using XploRe Quantlet Server (DoA X). See the details in [HMYY04].
2.3 DoLStat@d (Data-oriented Learning System of Statistics)
DoLStat@d is a learning system, in which a variety of learning courses such as
“Statistics introductory course”, “Economics course” and “Visualization course” are
1636
Yuichi Mori, Yoshiro Yamamoto and Hiroshi Yadohisa
Fig. 2. Retrieval keys in
DoDStat@d
Category Key
Research
subjects
Agriculture
Economics
Education
Engineering
Government
Medical
Miscellaneous
Nutrition
Psychology
Science
Social science
Sports
Statistical Test
methods
ANOVA
Graph
Regression
Multivariate
Time series
Descriptive
Fig. 3. An example of data description page
provided. Each course consists of from four to seven analysis stories which are selected from DoDStat@d and ordered educationally according to the study target of
the course. See the details in [HMYY04] and [MHYY05].
3 Data collection and document generation systems
In addition to the three subsystems described in Sect. 2, we have just implemented
two more subsystems to DoSS@d : an online data collection system with a real time
analysis function (middle gray part in Fig. 1) and an online document generation
system with a database registration function (light gray part in Fig. 1).
So far there was no online system of data collection and registration in DoSS@d.
The data sets and analysis stories stored currently in DoSS@d are ones that the
authors and their collaborators collected, modified in the appropriate format and
registered to the database by hand. The new subsystems are therefore a great help
for us to collect a new data promptly and to register it to the database without
difficulties.
Data collection and document generation system
1637
On the other hand, when collecting and analyzing data, we often meet the necessity to collect data at different places in a short period and to observe the analysis of
the temporary data during collection, especially in such researches as public-opinion
poll, exit poll and traffic volume survey. Considering those situations, a rapid and
mobile data collection system with a real time analysis function is desirable. This
kind of rapid and mobile data collection environment can be realized recently because several wireless network services and portable telephone services are becoming
available here and there, by which we can access to the Internet. We have therefore let the new subsystems have more mobility, i.e., we have implemented data
handling functions for mobile information equipments including mobile computers
and portable phones which can connect to the Internet with wireless communication
systems.
3.1 Data Collection System
Data Collection System consists of three modules:
• Data Collection Manager
• Data Collection Control
• Basic Analysis Control
and a link to Document Generation System.
At first, using Data Collection Manager (Fig. 4) which administers all the functions in Data Collection System, an administrator of the system registers a project
in which data collection is conducted and staffs (survey conductors) who handle the
data collection in the project.
A survey conductor creates a data entry form of the project on the web using
Data Collection Control (Fig. 5). More than one data entry forms can be registered
to one project. Data Collection Control provides a variety of support tools and
templates to create an entry item easily based on the item’s type such as singlechoice, multiple-choice, numeric open-end and text open-end. A finished data entry
form is added to the list in Data Collection Control in two types of formats: HTML
format for ordinary internet browser (Fig. 6) and mobile HTML format for portable
phones (Fig. 7). Data Collection Control allows a conductor to set the open/expire
date of a form, edit items and modify or delete a form in the list. When the open date
of a form comes, the form is open at a particular URL on the Internet automatically
until its expire date.
Then data collectors (or respondents in a questionnaire survey) input data into
the data entry form one by one. Inputted data is appended to the temporary data
body in the server every time one record is submitted. A conductor and collectors
can obtain the summary of simple statistics of the temporary data anytime by
using Basic Analysis Control (Fig. 8) which is linked from Data Collection Control.
Simple statistics and graphs displayed on a portable phone screen are as in Fig. 9.
These functions allow them, for example, to observe the change of the data during
collection. If they want to analyze the data in detail, DoAStat@d can be used. They
can also download the data set in the CSV format at the point of time through an
ordinary browser (mobile HTML does not provide the download function).
1638
Yuichi Mori, Yoshiro Yamamoto and Hiroshi Yadohisa
Fig. 5. Data Collection Control
Fig. 4. Data Collection Manager
3.2 Document Generation System
When a sufficient volume of data is collected, a conductor checks the data body and
confirms whether it is suitable for DoSS@d . If it is suitable, the conductor creates
the data description file using Document Generator (Fig. 10) which helps to make
an XML document of the data attributes in the DoSS@d format (an XML file is
displayed like Fig. 3 on an ordinary browser) and generates three data bodies (in
HTML format, tab- and space-delimited values) automatically based on the CSV
file of the data. As shown in Fig. 10, most of attributes are imported from Data
Collection System in case the data was collected using that system. The completed
XML file and four data bodies including the CSV file are automatically registered
(uploaded) to DoDStat@d .
4 Concluding remarks
We can use Data Collection System separately as an online data collection and
analysis tool. Since the system is utilized anywhere and anytime through wired and
wireless network (especially using portable phone services), it develops the potential
of mobile data collection and analysis environment. Furthermore, since this system
and Document Generation System work as useful user interfaces between data collection and DoSS@d , we can increase the number of data sets in DoDStat@d to
cover all users’ interests. Thus the new subsystems implemented here can not only
reinforce the data collection function so that users can easily collect data sets and
temporarily perform the simple analysis, but also can make DoSS@d more accessible
and useful.
Data collection and document generation system
1639
Fig. 7. Screen shots of
data collection forms
for portable phone
Fig. 6. An example of data collection form
Fig. 8. Simple statistics of a temporary data
Fig. 9. Screen shot
of simple statistics for
portable phone
1640
Yuichi Mori, Yoshiro Yamamoto and Hiroshi Yadohisa
Fig. 10. Data description XML generator
References
[MYY03]
Mori, Y., Yamamoto, Y., Yadohisa, H.: Data-oriented Learning System
of Statistics based on Analysis Scenario/Story (DoLStat). Bulletin of
the International Statistical Institute, 54th Session Proceedings Volume
LX Two Books, Book 2, 74–77 (2003)
[HMYY04] Honda, K., Mori, Y., Yamamoto, Y. and Yadohisa, H.: Web-Based Analysis System in Data-oriented Statistical System ”DoSS@d”. In: Antoch, J
(ed), COMPSTAT2004 Proceedings in Computational Statistics, 12091216, Phisica-Verlag (2004)
[MHYY05] Mori, Y., Honda, K., Yadohisa, H. and Yamamoto, Y.: Interactive analysis system on the web for data-oriented approaches. The 55th Session
of the International Statistical Institute, Abstract Book 306-307 and
CD-ROM (2005)
Download