Coding and interpreting log stream data

advertisement

Coding and interpreting log stream data

Patrick Griffin

Assessment Research Centre, MGSE

Presented at the Annual Meeting of AERA, April 6 th , 2014, Philadelphia

Data coding in computer based problem solving assessments

• Current coding processes mainly use a dichotomous success-failure (present – absent) scoring system;

• Greiff, Wustenberg & Funke (2012) determined three measures which represent dynamic problem solving(DPS):

– Model Building,

– Forecasting and

– Information Retrieval.

– These are applied to a series of steps in complex problems

• students are scored as false (0) or true (1) on the task

• ATC21S project draws inferences about how students solve problems as well and the outcome

• using a series of automated dichotomous scores, rubrics and partial credit approaches.

ATC21S approach

ATC21S - five broad components of collaborative problem solving (CPS)

(Hesse, 2014).

• cognitive skills (participation, perspective taking, social regulation);

• social skills (task regulation and knowledge building);

Within these five components, students are assessed on three ordered levels of performance on 19 elements.

Purpose of the assessments

• 11 assessment tasks tapping into different and overlapping skills within this framework

• Provides teachers with

• information to interpret students’ capacity in collaborative problem solving subskills,

• a profile of each student’s performance for formative instructional purposes

Unobtrusive assessments Problem Solving

Zoanetti, (2010)

• Moved individual problem solving from maths to games

– Recorded interactions between the problem solver and the task environment in an unobtrusive way

ATC21S (2009-2014)

• Collaborative problem solving tasks capture interactions between..

• the problem solvers working together

• the individual problem solver and the task

Activity log files

• Following Zoanetti, the files generated for the automatic records of these types of student –task interactions are referred to as a ‘session log file’.

• They contain free-form data referred to as ‘process stream data’ as free-form text files with delimited strings of text

Process stream data

• MySQL database architecture recorded interactions with the task environment to describe solution processes in an unobtrusive way.

• Process stream data describe distinct key-strokes and mouse events such as typing, clicking, dragging, cursor movements, hovering time, action sequences etc. recorded with a timestamp.

• Sequential numbering of events enabled timestamps to record analysis of action sequences and inactivity.

• ‘Process stream’ data describes the time stamped data(Zoanetti, 2010).

Common and unique indicators

• ‘Common’ or ‘Global’ events apply to all tasks;

• ‘Unique’ or ‘Local’ events are unique to specific tasks due to the nature of the behaviours and interactions those tasks elicit.

Application example Laughing Clowns

Student A View Student B View

Interpreting the log stream data 1

Event type Process stream data format

Session Start Student student_id has commenced task task_id

Explanation of data captured

Records the start of a task with student and task unique identification

Session End Student student_id has completed task task_id

Records the end of a task with student and task unique identification

Chat Text Message: “free form of message using the chat box”

Captures the contents of the chat message the students used to communicate with their partner

Ready To

Progress

Requested to move to page: page_id Indicates whether the student is ready to progress or not, and records the navigation endpoint which he is ready to progress to for multipage tasks

Other Click Screen x coords: x_coordinate; Screen y coords: y_coordinate;

Captures the coordinates of the task screen if the student has clicked anywhere outside the domain of the problem

Interpreting log stream data

Event Type Process stream data format Explanation of data captured

StartDrag startDrag: ball_id; x,y coordinates of the ball at the start of the drag

Records the identifier of the ball and it’s coordinates which is being dragged by the student

StopDrag stopDrag: ball_id; x,y coordinates of the ball at the end of the drag

Records the identifier of the ball which is being dragged by the student and it’s coordinates at the end of the drag

DropShute dropShutePosofShuteId: ball_id; x,y coordinates where the ball when it coordinates and the value of the clown head was dropped

Records the identifier of the ball, it’s shute when it was dropped by the student

Check box SelectionValue: option_value Captures data if students agree or disagree on how their machines works

Session logs and chat stream

• process and click stream data are accumulated and stored in session logs

• A chat box tool captures text exchanged between students and stored in string data format.

• All chat messages were recorded with a time stamp.

Recording action and chat data

Interpreting counts and chats

• Each task process log stream was examined for behaviours indicative of cognitive and social skills as defined by Hesse

(2014)that could be captured algorithmically.

• Indicators were coded as rule-based indicators through an automated algorithmic process similar to that described by

Zoanetti (2010).

• Zoanetti showed how process data (e.g., counts of actions) could be interpreted as an indicator of a behavioural variable

(e.g., error avoidance or learning from mistake)

• For example, in the Laughing Clowns task a count of the

‘dropShute’ actions (dropping the balls into the clown’s mouth) can indicate how well the student managed their resources (the balls).

Direct and inferred indicators

• Indicators that can be captured in all tasks are labelled ‘global’.

• They included total response time, response time to partner questions, action counts, and other behaviours that were observed regardless of the task.

• Indicators that are task-specific were labelled

‘local’.

• There are two categories of local indicators: direct and inferred.

• Direct indicators represent those that can be identified clearly, such as a student performing a particular action.

• Inferred indicators relate to such things as sequences of action/chat within the data. Patterns of indicators are used to infer the presence of behaviour indicative of elements in the Hesse conceptual framework.

Coding indicative actions

• Each indicator was coded with a unique ID code. Using the example of the unique ID code ‘U2L004A’,

– ‘U2’ the Laughing Clowns task,

– ‘L’ ‘local’ indicator specific to that task

• (‘G’ would represent that it was a global indicator that could be applied to all tasks),

– ‘004’ fourth indicator created for this task

– ‘A’ applicable to student A.

• Programming algorithms search for and capture the coded data from the process stream log files;

• A count of actions in indicators are converted into either a dichotomy or partial credit scores.

• Panels used an iterative process to map indicators onto the

Hesse framework until a stable allocation was agreed upon.

Algorithms and scoring rules

Indicator

Code

U2L004A

U2L004B

Details and scoring rule

Systematic approach. All positions have been covered.

Scoring rule: Threshold value.

Task name: Laughing Clowns.

Algorithm Output

Step 1: Find all drop ball occurrences captured as dropShute and their corresponding positions as dropShuteL,dropShuteR, dropShuteM.

Step 2: Then count all the occurrence of the action recorded under

‘dropShute’ and their unique positions from the log.

Step 3: Increase the value of the indicator by one, if one or more

‘dropShute’ occurs in the form of dropShuteR, dropShuteL, and dropShuteM.

Step 4: If the total number of unique dropShutes (dropShuteR, dropShuteL, and dropShuteM) from the log is less than three then the value of the indicator is defined as -1 to indicate missing data.

Count values

Global001A

Global001B

Acceptable time to first action given reading load.

Time (in seconds) spent on the task before first action (interpreted as reading time)

Scoring rule: Threshold time.

Step 1: Find the starting time when a student joins a collaborative session.

Step 2: Find the previous record of the first action.

Step 3: Find the time of that previous record (from step 2).

Step 4: Calculate the time difference obtained (from step 1 and step 3), indicating the time before first action.

Time

Global005A

Global005B

Interactive chat blocks: Count the number of chat blocks (A, B) with no intervening actions.

Consecutive chats from the same player counts

Step 1: Find all the consecutive chat from student A and B without any intervening action from A or B. Treat two or more consecutive chats from a single student as one chat.

as 1 (e.g., A,B,A,B = 2 chat blocks; A,B,A,B,A,B = Step 2: Increase the value of the indicator by one if one block is found.

3 chat blocks; AA,B,A,BB = 2 chat blocks)

Scoring rule: Threshold number.

Count values

Coded data and variable identification

Defining indicators

Using indicator data

• Scores from a set of indicators function similarly to a set of conventional test items requiring stochastic independence of indicators;

• Most indicators scored ‘1’ if and ‘0’ to the indicator if absent for each student. In the clowns task a player needs to leave a minimum number of balls for his/her partner in order for the task to be completed successfully. If true – ‘1’, if not ‘0’.

• Frequency-based indicators could be converted into polytomous scores based on threshold values and an iterative judgement and clibration process.

Forming a dichotomous indicator from frequency data

Polytomous indicator from frequency data

Separating the players - scoring A and B

• Collaboration cannot be summarised by a single indicator, – ‘students communicated’ – it involves communication, cooperation and responsiveness.

For collaboration, the presence of chat linked to action

– pre and post a chat event – was used to infer collaboration, cooperation or responsiveness linked to the Hesse framework

• The patterns of player-partner (A-B) interaction.

• A series of three sequences of player partner interaction was found to be adequate yielding the following possible player-partner combinations

:

1) A, B, A;

2) A, B, B;

3) A, A, B.

• These combinations apply only to the action of the initiating student (A).

Each student was coded separately in the data file, so the perspective changed when the other student (B) was scored

Assigning to A and B

Type

Interactive chat-actionchat blocks

Measurement count

Combination player + player + partner

Interactive chat-actionaction blocks count count count count player + partner + partner player + partner + player player + player + player player + player + partner

Interactive chat-chataction blocks count count player + partner + player player + partner + partner

Interactive actionaction-chat blocks AAC count count player + partner + player player + partner + partner count player + partner + player

Perspective from student A

AAB

ABB

ABA

AAA

AAB

ABA

ABB

ABA

ABB

ABA

Perspective from student B

BBA

BAA

BAB

BBB

BBA

BAB

BAA

BAB

BAA

BAB

Mapping indicators to Hesse Framework

The empirical data were checked against the relevant skill in the conceptual framework

(Hesse, 2014).

• relative difficulty was consistent with framework.

• map each indicator to relevant skill it was intended to measure

• refine the definition of each indicator to clarify the link between the algorithm and the construct.

• Frequency used as a proxy measure of difficulty

Indicator review cycle

• IRT yielded a hierarchy of the descriptors;

• Substantive order checked for meaning within a broader collaborative problem solving framework.

• Iterative review process to ensure that the conceptual descriptors were supported by empirical item location, which in turn informs the construct continuum.

Domains of indicators social and cognitive

• Clusters of indicators interpreted to identify levels of progression;

• The indicators were divided into their two dimensions - social or cognitivebased on their previous mapping

– Then into five dimensions.

• Skills within each dimension were identified to represent the progression from novice to expert.

Parameter invariance and fit

• Multiple calibrations allowed for comparison and analysis of item parameters.

• The stability of parameters remained after number of indicators reduced. from over 450 to fewer than 200.

• The removal of poorly ‘fitting’ indicators reduced the standard errors of the item parameters, while maintaining the reliability of the overall set .

Calibration of Laughing Clowns task

12

13

14

15

8

9

6

7

10

11

16

17

18

3

4

5

1

2

VARIABLES item

12

13

14

15

8

9

6

7

10

11

16

17

18

3

4

5

1

2

ESTIMATE ERROR^

3.106

-0.686

1.01

0.454

-0.435

-1.409

-0.218

0.895

0.267

-0.657

1.094

0.424

-0.523

-1.416

-0.011

1.464

-2.714

-0.646

0.046

0.04

0.04

0.039

0.039

0.042

0.039

0.039

0.039

0.04

0.04

0.039

0.039

0.042

0.039

0.039

0.045

0.166

MNSQ

UNWEIGHTED FIT

Confidence Interval

1.13

0.97

1

1

1

0.98

1.01

0.98

1.06

1

0.98

1.05

0.98

0.97

0.99

1.04

0.94

0.98

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.91

0.92

0.92

0.94

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.09

1.08

1.08

1.06

T

-0.5

0.3

-0.4

1.3

0

-0.5

1

-0.5

-0.7

-0.2

1

-1.6

-0.7

2.8

-0.5

-0.1

-0.1

0

MNSQ

WEIGHTED FIT

Confidence Interval

1.02

0.98

1

1

1

0.99

1.01

0.99

1.06

1

0.99

1.04

0.98

0.98

0.99

1.02

0.99

0.98

0.75

0.95

0.94

0.97

0.96

0.9

0.97

0.95

0.98

0.95

0.94

0.97

0.96

0.9

0.98

0.93

0.8

0.97

1.25

1.05

1.06

1.03

1.04

1.1

1.03

1.05

1.02

1.05

1.06

1.03

1.04

1.1

1.02

1.07

1.2

1.03

T

-0.2

0.9

-0.5

4.5

0

-0.5

2.8

-0.9

-0.3

-1

0.5

-0.1

-0.9

0.2

-0.7

0

-0.2

-0.1

---------------------------------------------------------------------------------------

|1 |

| |

|16 |

| |

| |

| |

X| |

| |

X| |

X|11 |

XX| |

1 XX|3 |

XXX| |

X|8 |

XXXXX| |

XXXXX| |

XXXXXXX| |

XXXXXXXXXX| |

XXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXX|4 |

XXXXXXXXXXXXXXXXXXXXXXXXXX|12 |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXXX|9 |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

0 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|15 |

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXX|7 |

XXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXXXXXXXXXXX| |

XXXXXXXXXXXXXX| |

XXXXXXXXXXXXX|5 |

XXXXXXXXXX|13 |

XXXXXXXXX| |

XXXXXX| |

XXXX|10 18 |

XXXXX|2 |

XXX| |

| |

X| |

| |

| |

-1 | |

| |

| |

| |

| |

| |

| |

|6 14 17 |

=======================================================================================

Each 'X' represents 2.9 cases

=======================================================================================

Stability of indicator difficulty estimates across countries

Challenges for the future

• Scaling all 11 tasks

• One, two and five dimensions

• Stability of indicator estimates over language, curriculum and other characteristics

• Simplifying the coding process

• Using chat including grammatical errors, non-standard syntax, abbreviations, and synonyms or ’text-speak’.

• Capture these text data in a coded form.

• Complexity and simplicity without loss of meaning- built into task construction as an a-priori design feature.

• Design templates for task development and scoring.

Download