PowerPoint-presentatie

advertisement
How to get your hands on the data of your
LMS (e.g. Blackboard)
Alan Berg Co-chair SIG SURF / Community officer Apereo LAI
Acknowledgements
Resources mentioned in Resources.txt, including:
Understanding The Advanced Reporting Database
John Knight 2005:
–
5 relevant slides are copied from this presentation
(orange background).
Clip art: https://openclipart.org/
WARNING!
A lot of information will be covered in a
short time.
Agenda
Who am I
Motivation
BlackBoard Processes
BlackBoard database schema
Pulling data out of BlackBoard
Simple Analysis in R
Brain Storm over possibilities
Wrangler | 07 maand 2014
4
Who am








I have relevant practical experience.
Developed building Blocks
Integrated stuff with UVA's SIS
Run Quality Assurance on a number of large scale projects
such as UvA's Scale out
Reported bugs
Presented a couple of times (a long time ago) at BlackBoard
conferences
Old, old stuff: Blackboard 6 usage patterns and implications
for the Universiteit van Amsterdam:
http://www.uva.nl/binaries/content/documents/personalpages
/b/e/a.m.berg/en/tab-three/tab-three/cpitem%5B41%5D/asset
Pulled out data from BlackBoard for research projects
Motivation
To break down walls that get in the way of getting at your
data.


By providing researchers with the details needed to
communicate with the people that hold the keys to the
data.
By providing system /functional administrators an idea
of the processes needed by researchers to use the data
Questions for the audience



1
Point to a Google doc
A quick count of the audience roles
A quick question to the audience of what they see as
barriers to their research work.
BlackBoard Processes
Motivation: Being specific with processes,
filenames and locations so that it is easier to
find and do stuff
Take homes




There is a set of best practices that should be followed
BlackBoard files that matter

Bb-config.properties

Bb-tasks.xml
Scheduled tasks for reporting
The process of moving tables to a backup database
Best Practices
Accessing the Database




http://www.edugarage.com/display/BBDN/Blackboard+Open+
Database+-+OpenDB

Protect your data

Work with copies of your data

Remember to test performance

Use tools to access your databases
Don't reinvent the wheel

See whats out in the wild

Re-use and improve artefact's
Don't be arrogant: Ask the team that administers the system
Look at the Open Database Schema
Questions for the audience


2
Point to a Google doc
Additional best practices for accessing the Database
BB-config.properties





Global configuration file for Blackboard. Options include:

Database

JVM tuning

Content Management

Server threads

Mail servers
Performance depends on tuning this file
Named a property file. Follows a specific text format:

bbconfig.unix.httpd.ssl.portnumber=443
Location similar to:

/app/blackboard/config/bb-config.properties
Applying changes via a script PushConfigUpdates.bat/.sh
BlackBoard Processes
Code that can run on any BlackBoard Server to get facts
SECURE YOUR DATASOURCE
TEST AGAINST A COPY OF YOUR DB
1)
2)
3)
4)
Create a user that can read BB-config.properties
Create code that reads BB-config.properties
Run Code through a scheduled task
Generate time stamped reoprts or Text dumps
5)
LATER
Improve code based on requirements
PurgeAccumulator job


Default runs at 1 AM
Does three things

Summarizes data

Moves data to a backup DB

Purges Production pf data older than a certain date

Configured via bb-tasks.xml

Runs stored procedures in the database
The gruesome details
<task-entry key="bb.stats.purging" version="60">
<task classname="blackboard.platform.tracking.PurgeApplicationTask">
<property name="delay" value="21600000" />
<property name="period" value="86400000" />
<property name="xml.registered.delay" value="1:00" />
<property name="xml.registered.period" value="24" />
<property name="days_to_keep" value="180" />
<property name="target" value="live" />
<property name="dev_null" value="/dev/null" />
<property name="command-line"
value="/usr/local/blackboard/tools/admin/PurgeAccumulator.sh" />
</task>
</task-entry>
Schema Description
Number of supporting records
are increasing per version
2012 – Partial list
2007 – Full List
Titel van de presentatie | 00 maand 0000
Questions for the audience


3
Point to a Google doc
Based on the 2007 table set which sort of questions can
you ask when combined with SIS?
–
Promise to look at sending the relevant SQL
statements or publishing to the LASI Utrecht website
Building Blocks - More









Provides supporting services
And JSP (GUI) libraries
From experience, permissions structure changes
between versions.
Subtle changes in services
Database schema is actually more stable.
SQL queries more raw but are more efficient than
service calls
Was easier to learn Building blocks by code completion
in IDE (E.G: Eclipse) than through the documentation
Steep learning curve

http://workgroups.clemson.edu/DCIT2803_BBDEV/Tut
orial/BBTutorial.pdf
Take home: If you just want the data go directly to the
Bb-manifest.xml and Code
Don't do this without a safety net
<permission type="java.lang.RuntimePermission" name="db.connection.bb_bb60" actions="connect,accept"
/>
ConnectionManager conman;
int i=0;
private Connection bbConnectDatabase() throws ConnectionNotAvailableException,
InterruptedException
{
BbDatabase bbDb = DbUtil.safeGetBbDatabase();
conman = bbDb.getConnectionManager();
while(conn == null && i<10){
try {
conn = conman.getConnection();
}
catch(ConnectionNotAvailableException cnae){
Thread.sleep(1000);
++i;
}
}
return conn;
}
Getting to the database in Perl
use Config::Simple;
use DBI;
$config_file="/home/aberg/Desktop/BB_PILOT_2014/CONFIGURATION/database.cfg";
# Configuration file location
$config = new Config::Simple($config_file);
# Open Database Connection
$dbh = DBI->connect($config->param("ConnectionSettings.dsn"), $config->param("ConnectionSettings.dbuser"),
$config->param("ConnectionSettings.pass")) or die "Connection Error: $DBI::errstr\n";
generate_user_report();
$dbh->disconnect
;
Perl take home








Easy to connect
Just different configuration for queries to the archive
database than production or development
Perl is a well known language
Perl is EXCELLENT for text processing
Perl can generate unreadable code and not really a
team language (personal opinion)
Great for writing functionality quickly
Excellent for manipulating and merging data
Understood by many System administrators
Favorite tool – SQL Developer






Everyone has their favorite tools
Can export data efficiently
Practice SQL statements
Known to a wide community
What are the ethics and security implications
STOP and ask Hendrik L to reinforce message from his
presentation
Questions for the audience


4
Point to a Google doc
Have you any favorite tools for use with databases or
combining data sources?
Titel van de presentatie | 00 maand 0000
ASR Schema Elements
Table Name(s)
Data Stored
ACTIVITY_ACCUMULATO
R
SYSTEM_TRACKING
End user events.
APPLICATION
COURSE_MAIN
COURSE_ROLES
COURSE_USERS
DATA_SOURCE
INSTITUTION_ROLES
NAVIGATION_ITEM
SYSTEM_ROLES,USERS
USER_ROLES
Daily summary stats
generated
by summarization task in
PurgeAccumulator
process
Stores supporting data
referenced by
ACTIVITY_ACCUMULATOR
for ASR Reporting
Data Rentention
in main schema
Up to 360 days of
data
No limit (never
purged)
No limit. Tables are
never purged
automatically.
Data Structure
Activity Accumulator Dictionary
Column
Description
PK1
EVENT_TYPE
Primary Key
Type of Event Posted
USER_PK1
Relates to USERS.PK1
COURSE_PK1
Relates to COURSE_MAIN.PK1
GROUP_PK1
Relates to GROUP.PK1
FORUM_PK1
Relates to FORUM_MAIN.PK1
INTERNAL_HAND
LE
CONTENT_PK1
Relates to NAVIGATION_ITEM.INTERNAL_HANDLE
Relates to COURSE_CONTENTS.PK1
Activity Accumulator Dictionary
Column
Description
DATA
Additional information dependant on event type.
TIMESTAMP
STATUS
Date/Time the event was posted
1 unless error occurred
MESSAGES
Error message details
SESSION_ID
Relationship to SESSIONS.PK1
REMOVE DATA and MESSAGES as might
contain personal information?
Activity Accumulator Events
Event
Description
Relationships
LOGIN_ATTEMPT
A user attempted to login to the
system. Captures the session id
of the attempted login.
A user clicked the logout button
SESSION_ID
LOGOUT
SESSION_INT
PAGE_ACCESS
A user connected to the system
and a session was created
A page was accessed
COURSE_ACCES
S
A page within a course was
accessed
SESSION_ID
USER_PK1
SESSION_ID
SESSION_ID
USER_PK1
NAVIGATION_ITEM
SESSION_ID
USER_PK1
COURSE_PK1
NAVIGATION_ITEM
Data Structure







PK1 are keys to other tables
PK1 are effectively random numbers
As long as we do not combine with other tables then
pseudo anonymous. Only combine when we have opted
in
Data and messages columns may contain personal
information.
Question for Hendrik L: How anonymous do we have to
be?
5] Question for audience: How secure do we have to be
with the data and how long can it live for?
6] Question for the audience: Do we need an audit trail?
Titel van de presentatie | 00 maand 0000
Example: Export in SQL developer
SQL Examples - General
select event_type, count(*) from
BackupDB_name.activity_accumulator group by
event_type
EVENT
2007
2012
LOGOUT
781618
1174147
SESSION_INT
20875358
20241290
CONTENT_ACCESS
5284051
14205997
COURSE_ACCESS
42266577
57808666
MODULE_ACCESS
15199704
66422352
LOGIN_ATTEMPT
3609998
PAGE_ACCESS
1528547
82827763
TAB_ACCESS
9715716
101974
7430655
eventTypes2007 <- c (1174147, 20241290, 14205997, 57808666, 66422352, 7430655,
82827763, 101974)
labels <- c ('LOGOUT', 'SESSION_INT', 'CONTENT_ACCESS',
'COURSE_ACCESS','MODULE_ACCESS', 'LOGIN_ATTEMPT', 'PAGE_ACCESS',
'TAB_ACCESS')
pie(eventTypes2007,labels)
R makes life easy
Who uses R?
Titel van de presentatie | 00 maand 0000
Generating PDF with LATEX and R
\documentclass{article}
\begin{document}
\SweaveOpts{concordance=TRUE}
<<echo=TRUE, fig=true,keep.source=FALSE>>=
eventTypes2007 <- c (1174147, 20241290, 14205997,
57808666, 66422352, 7430655, 82827763, 101974)
labels <- c ('LOGOUT', 'SESSION_INT', 'CONTENT_ACCESS',
'COURSE_ACCESS','MODULE_ACCESS',
'LOGIN_ATTEMPT', 'PAGE_ACCESS', 'TAB_ACCESS')
pie(eventTypes2007,labels)
@
\end{document}
Titel van de presentatie | 00 maand 0000
Why LATEX and R





Can be used to produce papers
Turning echo on will show the reviewer the method's
used when analyzing data.
Allows the reviewer to test the process with their own
data.
Reproducible
Standardizing on well known and well documented
practices.
Questions for the audience


7
Point to a Google doc
Is it validate to contemplate a Dutch LA common
approach around R and Latex?
Transversing the information
Depends on the membership of your experiment.
Examples:




Whole course opted in: MOOC like conditions
An individual user: Individual consent
A member of a course
Time limited
Whole course opted in
NEED TO FIND THE PRIMARY KEY
OF THE COURSE IN COURSE_MAIN
select pk1 from BackupDB.course_main where
course_id='20072008.1.testcourseonly'
select count(*) from
STATS_2007_2008.activity_accumulator where
course_pk1='22144'
select count(unique(user_pk1)), count(unique(session_id)),
count(unique(content_pk1)),
count(unique(event_type))from
BackupDB.activity_accumulator where
GOTCHA: Active users
CRSMAIN_PK1 → USERS_PK1
select users_pk1 from
STATS_2007_2008.course_users
where crsmain_pk1='22144'
An individual user: Individual consent
Users table to find the right PK1
Has also got LAST_LOGIN_DATE
Dangerous as you get to see
Passwords
select pk1 from
BackupDB.users
where user_id='xxxxxx'
STUDENT_ID is descriptive
Reliability depends on provision
process
A member of a course
If you know the PK1 of the user and the PK1 of the course
then
select * from BackupDB.activity_accumulator where
user_pk1='1' and course_pk1='2'
Think in terms of set theory
Pka and Pkb and Pkc …..
Time limited
Total activity in the last $delta_days
PERL example
$query="select count(*) from $owner.activity_accumulator
where course_pk1=\'$course_pk1\' and
user_pk1=\'$fact_pk1{$user}\' and timestamp > sysdate $delta_days";
Tool Specific
Null 200268422
content
31251028
announcements_entry 5082438
my_announcements
1969977
discussion_board_entry
check_grade
select internal_handle, count(*) as TOTAL
from stats_2012_2013.activity_accumulator
group by internal_handle order by TOTAL
DESC
1711946
Number of unique handles (at UvA) = 345
1157869
cp_gradebook2_modify_item 1047877
eph_ephorus-assignment
course_tools_area
1046151
852568
db_thread_list_entry 606197
classic_course_catalog
cp_announcements
groups 404074
agroup 382299
442061
514890
SAMPLING BIAS
Are we ignoring the majority
of events that are not
associated with an internal
handle?
Circling back
Questions for the audience


3
Point to a Google doc
Which sort of questions can you ask when combined
with SIS?
–
Promise to look at sending the relevant SQL
statements or publishing to the LASI Utrecht website
Summary (so far)








Each click is recorded by BlackBoard
The activity_accumulator table is the place to be.
You now understand the files that are important
You now understand the basic BlackBoard Schema
Privacy and ethics limits how you search the data
Privacy and ethics relate to the PK1's
Privacy and ethics relates to which columns you can
export
MORE?
Download