Weakly Supervised Models of Aspect-Sentiment for Online Course Discussion Forums ARTI RAMESH SHACHI H. KUMAR JAMES FOULDS LISE GETOOR I Q L N S registrants Low completion rate ▪ submitting le ▪ submitting as help model enga MOOCs • Massive: attracts thousands of participants • Open: open access, content, and assessment • Online: hosted online by education companies in partnership with top universities 2 Classroom • Classroom – Face-to-face interaction between instructor and students MOOCs • MOOC Discussion Forums – Primary means of interaction between instructor and students • Large number of students, posts: Hard to monitor manually • Posts discuss problems in course - course material, errors, feedback 3 Example MOOC Posts MOOC Post The video is very choppy. Can somebody fix this? Fine-grained Topic Lecture-Video Will subtitles be made available for the lectures for this week? I liked the transcripts from last week. Lecture-Subtitles Will everyone get a certificate or only people in the signature track? Certificate When is quiz 4 due? 4 Quiz-Deadlines Predicting fine-grained problems: Challenges • Labeled data hard to obtain – 5-10% posts contain problems – Privacy concerns around data sharing – Problems differ across courses • Unsupervised/weakly supervised approaches desirable – System not fine-tuned to one course, but can adapt across courses 5 Related Work Aspect-sentiment in Online Reviews • Semi-supervised generative model, with seed words to identify aspect clusters [Mukherjee et al., 2012] • Unsupervised Aspect-Sentiment Model for Online Reviews [Brody et al., 2012] • Hierarchical Aspect-Sentiment Model for Online Reviews [Kim et al. 2013] MOOCs • Predicting Instructor Intervention in MOOC Forums[Chaturvedi et al., 2014] 6 SeededLDA for MOOC Forums SeededLDA • Guide topic discovery by specifying representative seed words • seededLDA uses seeds to bias topic-word and worddocument distributions • seededLDA gathers words related to seed words SeededLDA for MOOCs • Many classes but a common set of seed words • Seed words for MOOCs from syllabus and forums 7 Jagarlamudi et al. 2010 Hinge-loss Markov Random Fields & Probabilistic Soft Logic • Hinge-loss Markov Random Fields (HL-MRFs) – Logic-based MRFs that can reason about both discrete and continuous graph data scalably and accurately – Efficient Inference: convex optimization in continuous space • Probabilistic Soft Logic (PSL) – Templating language for HL-MRFs – Weighted logical rules to model dependencies – Continuous variables in [0,1] 8 Bach et al. 2012 Predicting fine-grained problems and sentiment: Joint Prediction Problem • Analogous to predicting aspect-sentiment in online reviews • Aspect hierarchy connecting course elements • HL-MRF framework – Combining different features – Encoding coarse-to-fine aspect hierarchy – Encoding dependencies between aspect and sentiment • Jointly modeling aspect and sentiment 9 Our Contributions • Identify fine-grained aspects in online courses • Extract course-specific features from posts using SeededLDA • Construct coarse-to-fine aspect hierarchy to model aspect dependencies • Construct weakly-supervised joint model for aspect-sentiment using HL-MRFs • Validate system using crowdsourced posts sampled from 12 courses 10 MOOC Aspect-Sentiment Models: SeededLDA • Coarse Aspect seeds LECTURE: lecture, video, download, transcript, slide, note QUIZ: quiz, assignment, question, midterm, exam, submission CERTIFICATE: certificate, score, statement, signature SOCIAL: name, course, introduction, study, group • Sentiment seeds POSITIVE: interest, exciting, thank, great, happy, glad, enjoy NEGATIVE: problem, difficult, error, issue, unable, misunderstand NEUTRAL: coursera, class, hello, everyone, greet, name 11 SeededLDA Model • Fine Aspect seeds LECTURE-VIDEO: video, problem, download, play, player, LECTURE-AUDIO: volume, low, headphone, sound, audio, hear LECTURE-LECTURER: professor, fast, speak, pace, follow, speed LECTURE-SUBTITLES: transcript, subtitle, slide, note, lecture, LECTURE-CONTENT: typo, error, mistake, wrong, right, incorrect QUIZ-CONTENT: question, challenge, difficult, understand, typo QUIZ-SUBMISSION: submission, submit, quiz, error, unable, resubmit QUIZ-GRADING: answer, question, answer, grade, assignment, quiz QUIZ-DEADLINE: due, deadline, miss, extend, late 12 PSL-Joint: Combining Features SeededLDA score for fine aspect and coarse aspect to predict fine aspect of post P 13 PSL-Joint: Combining Features SeededLDA score for sentiment and fine aspect to predict fine aspect 14 PSL-Joint: Encoding Dependencies Dependency between coarse aspect and fine aspect 15 PSL-Joint: Encoding Dependencies Dependency between sentiment and fine aspect 16 Experimental Evaluation F-1 scores for SeededLDA and PSL-Joint for coarse aspects Model Lecture Quiz Certificate Social SeededLDA 0.632 0.657 0.459 0.654 PSL-Joint 0.630 0.706 0.621 0.659 SeededLDA and PSL-Joint for sentiment 17 Model Positive Negative Neutral SeededLDA 0.182 0.517 0.356 PSL-Joint 0.189 0.615 0.434 Experimental Evaluation SeededLDA and PSL-Joint for coarse aspects Model Lecture Quiz Certificate Social SeededLDA 0.632 0.657 0.459 0.654 PSL-Joint 0.630 0.706 0.621 0.659 PSL-Joint SeededLDA SeededLDA andoutperforms PSL-Joint for sentiment for most coarse aspects Model Positive Negative Neutral and sentiment SeededLDA 0.182 0.517 0.356 PSL-Joint 18 0.189 0.615 0.434 Experimental Evaluation Fine-grained aspects under coarse aspect lecture Model Content Video Audio Lecturer Subtitles SeededLDA 0.08 0.240 0.684 0.06 0.397 PSL-Joint 0.410 0.485 0.582 0.323 0.461 Fine-grained aspects under coarse aspect quiz 19 Model Content Submission Deadlines Grading SeededLDA 0.011 0.437 0.214 0.514 PSL-Joint 0.36 0.416 0.611 0.550 Experimental Evaluation Fine-grained aspects under coarse aspect “lecture” Model Content Video Audio Lecturer Subtitles SeededLDA 0.08 0.240 0.684 0.06 0.397 PSL-Joint 0.582 distinguishes 0.323 between lecturecontent and quizFine-grained aspects under coarse aspect “quiz” content 20 PSL-Joint 0.410 0.485 Model Content Submission Deadlines Grading SeededLDA 0.011 0.437 0.214 0.514 PSL-Joint 0.36 0.416 0.611 0.550 0.461 Experimental Evaluation Fine-grained aspects under coarse aspect “lecture” Model Content SeededLDA 0.08 Significant Video Audio Lecturer Subtitles 0.240 0.684 0.06 0.397 0.582 0.323 0.461 PSL-Jointimprovement 0.410 0.485 in scores for lecture-lecturer and quiz-deadlines Fine-grained aspects under coarse aspect “quiz” 21 Model Content Submission Deadlines Grading SeededLDA 0.011 0.437 0.214 0.514 PSL-Joint 0.36 0.416 0.611 0.550 Interpreting PSL-Joint Predictions “There is a typo or other mistake in the assignment instructions (e.g. essential information omitted).” SeededLDA Prediction: Lecture-content PSL-Joint Prediction: Quiz-content “Thanks for the suggestion about downloading the video and referring to the subtitles. The audio is barely audible, even when the volume is set to 100%” SeededLDA Prediction: Lecture-subtitles PSL-Joint Prediction: Lecture-audio 22 Conclusion: Fine-grained aspectsentiment in MOOC forums • Automatically detecting problems in forum posts useful for instructors • Weakly supervised probabilistic framework to automatically detect aspect and sentiment in online courses – SeededLDA and PSL-Joint models as means to encode domain information and predict aspect and sentiment • PSL-Joint significantly outperforms SeededLDA for many fine aspects, coarse aspects, and sentiment – Structural dependencies among aspect and sentiment helps in prediction 23