How to get your hands on the data of your LMS (e.g. Blackboard) Alan Berg Co-chair SIG SURF / Community officer Apereo LAI Acknowledgements Resources mentioned in Resources.txt, including: Understanding The Advanced Reporting Database John Knight 2005: – 5 relevant slides are copied from this presentation (orange background). Clip art: https://openclipart.org/ WARNING! A lot of information will be covered in a short time. Agenda Who am I Motivation BlackBoard Processes BlackBoard database schema Pulling data out of BlackBoard Simple Analysis in R Brain Storm over possibilities Wrangler | 07 maand 2014 4 Who am I have relevant practical experience. Developed building Blocks Integrated stuff with UVA's SIS Run Quality Assurance on a number of large scale projects such as UvA's Scale out Reported bugs Presented a couple of times (a long time ago) at BlackBoard conferences Old, old stuff: Blackboard 6 usage patterns and implications for the Universiteit van Amsterdam: http://www.uva.nl/binaries/content/documents/personalpages /b/e/a.m.berg/en/tab-three/tab-three/cpitem%5B41%5D/asset Pulled out data from BlackBoard for research projects Motivation To break down walls that get in the way of getting at your data. By providing researchers with the details needed to communicate with the people that hold the keys to the data. By providing system /functional administrators an idea of the processes needed by researchers to use the data Questions for the audience 1 Point to a Google doc A quick count of the audience roles A quick question to the audience of what they see as barriers to their research work. BlackBoard Processes Motivation: Being specific with processes, filenames and locations so that it is easier to find and do stuff Take homes There is a set of best practices that should be followed BlackBoard files that matter Bb-config.properties Bb-tasks.xml Scheduled tasks for reporting The process of moving tables to a backup database Best Practices Accessing the Database http://www.edugarage.com/display/BBDN/Blackboard+Open+ Database+-+OpenDB Protect your data Work with copies of your data Remember to test performance Use tools to access your databases Don't reinvent the wheel See whats out in the wild Re-use and improve artefact's Don't be arrogant: Ask the team that administers the system Look at the Open Database Schema Questions for the audience 2 Point to a Google doc Additional best practices for accessing the Database BB-config.properties Global configuration file for Blackboard. Options include: Database JVM tuning Content Management Server threads Mail servers Performance depends on tuning this file Named a property file. Follows a specific text format: bbconfig.unix.httpd.ssl.portnumber=443 Location similar to: /app/blackboard/config/bb-config.properties Applying changes via a script PushConfigUpdates.bat/.sh BlackBoard Processes Code that can run on any BlackBoard Server to get facts SECURE YOUR DATASOURCE TEST AGAINST A COPY OF YOUR DB 1) 2) 3) 4) Create a user that can read BB-config.properties Create code that reads BB-config.properties Run Code through a scheduled task Generate time stamped reoprts or Text dumps 5) LATER Improve code based on requirements PurgeAccumulator job Default runs at 1 AM Does three things Summarizes data Moves data to a backup DB Purges Production pf data older than a certain date Configured via bb-tasks.xml Runs stored procedures in the database The gruesome details <task-entry key="bb.stats.purging" version="60"> <task classname="blackboard.platform.tracking.PurgeApplicationTask"> <property name="delay" value="21600000" /> <property name="period" value="86400000" /> <property name="xml.registered.delay" value="1:00" /> <property name="xml.registered.period" value="24" /> <property name="days_to_keep" value="180" /> <property name="target" value="live" /> <property name="dev_null" value="/dev/null" /> <property name="command-line" value="/usr/local/blackboard/tools/admin/PurgeAccumulator.sh" /> </task> </task-entry> Schema Description Number of supporting records are increasing per version 2012 – Partial list 2007 – Full List Titel van de presentatie | 00 maand 0000 Questions for the audience 3 Point to a Google doc Based on the 2007 table set which sort of questions can you ask when combined with SIS? – Promise to look at sending the relevant SQL statements or publishing to the LASI Utrecht website Building Blocks - More Provides supporting services And JSP (GUI) libraries From experience, permissions structure changes between versions. Subtle changes in services Database schema is actually more stable. SQL queries more raw but are more efficient than service calls Was easier to learn Building blocks by code completion in IDE (E.G: Eclipse) than through the documentation Steep learning curve http://workgroups.clemson.edu/DCIT2803_BBDEV/Tut orial/BBTutorial.pdf Take home: If you just want the data go directly to the Bb-manifest.xml and Code Don't do this without a safety net <permission type="java.lang.RuntimePermission" name="db.connection.bb_bb60" actions="connect,accept" /> ConnectionManager conman; int i=0; private Connection bbConnectDatabase() throws ConnectionNotAvailableException, InterruptedException { BbDatabase bbDb = DbUtil.safeGetBbDatabase(); conman = bbDb.getConnectionManager(); while(conn == null && i<10){ try { conn = conman.getConnection(); } catch(ConnectionNotAvailableException cnae){ Thread.sleep(1000); ++i; } } return conn; } Getting to the database in Perl use Config::Simple; use DBI; $config_file="/home/aberg/Desktop/BB_PILOT_2014/CONFIGURATION/database.cfg"; # Configuration file location $config = new Config::Simple($config_file); # Open Database Connection $dbh = DBI->connect($config->param("ConnectionSettings.dsn"), $config->param("ConnectionSettings.dbuser"), $config->param("ConnectionSettings.pass")) or die "Connection Error: $DBI::errstr\n"; generate_user_report(); $dbh->disconnect ; Perl take home Easy to connect Just different configuration for queries to the archive database than production or development Perl is a well known language Perl is EXCELLENT for text processing Perl can generate unreadable code and not really a team language (personal opinion) Great for writing functionality quickly Excellent for manipulating and merging data Understood by many System administrators Favorite tool – SQL Developer Everyone has their favorite tools Can export data efficiently Practice SQL statements Known to a wide community What are the ethics and security implications STOP and ask Hendrik L to reinforce message from his presentation Questions for the audience 4 Point to a Google doc Have you any favorite tools for use with databases or combining data sources? Titel van de presentatie | 00 maand 0000 ASR Schema Elements Table Name(s) Data Stored ACTIVITY_ACCUMULATO R SYSTEM_TRACKING End user events. APPLICATION COURSE_MAIN COURSE_ROLES COURSE_USERS DATA_SOURCE INSTITUTION_ROLES NAVIGATION_ITEM SYSTEM_ROLES,USERS USER_ROLES Daily summary stats generated by summarization task in PurgeAccumulator process Stores supporting data referenced by ACTIVITY_ACCUMULATOR for ASR Reporting Data Rentention in main schema Up to 360 days of data No limit (never purged) No limit. Tables are never purged automatically. Data Structure Activity Accumulator Dictionary Column Description PK1 EVENT_TYPE Primary Key Type of Event Posted USER_PK1 Relates to USERS.PK1 COURSE_PK1 Relates to COURSE_MAIN.PK1 GROUP_PK1 Relates to GROUP.PK1 FORUM_PK1 Relates to FORUM_MAIN.PK1 INTERNAL_HAND LE CONTENT_PK1 Relates to NAVIGATION_ITEM.INTERNAL_HANDLE Relates to COURSE_CONTENTS.PK1 Activity Accumulator Dictionary Column Description DATA Additional information dependant on event type. TIMESTAMP STATUS Date/Time the event was posted 1 unless error occurred MESSAGES Error message details SESSION_ID Relationship to SESSIONS.PK1 REMOVE DATA and MESSAGES as might contain personal information? Activity Accumulator Events Event Description Relationships LOGIN_ATTEMPT A user attempted to login to the system. Captures the session id of the attempted login. A user clicked the logout button SESSION_ID LOGOUT SESSION_INT PAGE_ACCESS A user connected to the system and a session was created A page was accessed COURSE_ACCES S A page within a course was accessed SESSION_ID USER_PK1 SESSION_ID SESSION_ID USER_PK1 NAVIGATION_ITEM SESSION_ID USER_PK1 COURSE_PK1 NAVIGATION_ITEM Data Structure PK1 are keys to other tables PK1 are effectively random numbers As long as we do not combine with other tables then pseudo anonymous. Only combine when we have opted in Data and messages columns may contain personal information. Question for Hendrik L: How anonymous do we have to be? 5] Question for audience: How secure do we have to be with the data and how long can it live for? 6] Question for the audience: Do we need an audit trail? Titel van de presentatie | 00 maand 0000 Example: Export in SQL developer SQL Examples - General select event_type, count(*) from BackupDB_name.activity_accumulator group by event_type EVENT 2007 2012 LOGOUT 781618 1174147 SESSION_INT 20875358 20241290 CONTENT_ACCESS 5284051 14205997 COURSE_ACCESS 42266577 57808666 MODULE_ACCESS 15199704 66422352 LOGIN_ATTEMPT 3609998 PAGE_ACCESS 1528547 82827763 TAB_ACCESS 9715716 101974 7430655 eventTypes2007 <- c (1174147, 20241290, 14205997, 57808666, 66422352, 7430655, 82827763, 101974) labels <- c ('LOGOUT', 'SESSION_INT', 'CONTENT_ACCESS', 'COURSE_ACCESS','MODULE_ACCESS', 'LOGIN_ATTEMPT', 'PAGE_ACCESS', 'TAB_ACCESS') pie(eventTypes2007,labels) R makes life easy Who uses R? Titel van de presentatie | 00 maand 0000 Generating PDF with LATEX and R \documentclass{article} \begin{document} \SweaveOpts{concordance=TRUE} <<echo=TRUE, fig=true,keep.source=FALSE>>= eventTypes2007 <- c (1174147, 20241290, 14205997, 57808666, 66422352, 7430655, 82827763, 101974) labels <- c ('LOGOUT', 'SESSION_INT', 'CONTENT_ACCESS', 'COURSE_ACCESS','MODULE_ACCESS', 'LOGIN_ATTEMPT', 'PAGE_ACCESS', 'TAB_ACCESS') pie(eventTypes2007,labels) @ \end{document} Titel van de presentatie | 00 maand 0000 Why LATEX and R Can be used to produce papers Turning echo on will show the reviewer the method's used when analyzing data. Allows the reviewer to test the process with their own data. Reproducible Standardizing on well known and well documented practices. Questions for the audience 7 Point to a Google doc Is it validate to contemplate a Dutch LA common approach around R and Latex? Transversing the information Depends on the membership of your experiment. Examples: Whole course opted in: MOOC like conditions An individual user: Individual consent A member of a course Time limited Whole course opted in NEED TO FIND THE PRIMARY KEY OF THE COURSE IN COURSE_MAIN select pk1 from BackupDB.course_main where course_id='20072008.1.testcourseonly' select count(*) from STATS_2007_2008.activity_accumulator where course_pk1='22144' select count(unique(user_pk1)), count(unique(session_id)), count(unique(content_pk1)), count(unique(event_type))from BackupDB.activity_accumulator where GOTCHA: Active users CRSMAIN_PK1 → USERS_PK1 select users_pk1 from STATS_2007_2008.course_users where crsmain_pk1='22144' An individual user: Individual consent Users table to find the right PK1 Has also got LAST_LOGIN_DATE Dangerous as you get to see Passwords select pk1 from BackupDB.users where user_id='xxxxxx' STUDENT_ID is descriptive Reliability depends on provision process A member of a course If you know the PK1 of the user and the PK1 of the course then select * from BackupDB.activity_accumulator where user_pk1='1' and course_pk1='2' Think in terms of set theory Pka and Pkb and Pkc ….. Time limited Total activity in the last $delta_days PERL example $query="select count(*) from $owner.activity_accumulator where course_pk1=\'$course_pk1\' and user_pk1=\'$fact_pk1{$user}\' and timestamp > sysdate $delta_days"; Tool Specific Null 200268422 content 31251028 announcements_entry 5082438 my_announcements 1969977 discussion_board_entry check_grade select internal_handle, count(*) as TOTAL from stats_2012_2013.activity_accumulator group by internal_handle order by TOTAL DESC 1711946 Number of unique handles (at UvA) = 345 1157869 cp_gradebook2_modify_item 1047877 eph_ephorus-assignment course_tools_area 1046151 852568 db_thread_list_entry 606197 classic_course_catalog cp_announcements groups 404074 agroup 382299 442061 514890 SAMPLING BIAS Are we ignoring the majority of events that are not associated with an internal handle? Circling back Questions for the audience 3 Point to a Google doc Which sort of questions can you ask when combined with SIS? – Promise to look at sending the relevant SQL statements or publishing to the LASI Utrecht website Summary (so far) Each click is recorded by BlackBoard The activity_accumulator table is the place to be. You now understand the files that are important You now understand the basic BlackBoard Schema Privacy and ethics limits how you search the data Privacy and ethics relate to the PK1's Privacy and ethics relates to which columns you can export MORE?