[SAK-15780] SectionManager caching breaks multiple section member updates in a single thread Created: 25-Feb-2009 Updated: 24-Nov-2009 Resolved: 26-Feb-2009 Status: Project: Component/s: Affects Version/s: Fix Version/s: Closed Sakai Section Info 2.5.2, 2.5.3, 2.5.4, 2.6.0 Type: Reporter: Resolution: Labels: Remaining Estimate: Time Spent: Original Estimate: Bug Ray Davis Fixed None Not Specified Issue Links: Relate relates to SAK-13400 Add thread-local caching to SectionMa... relates to SAK-16602 Section Memberships screen not updati... 2.5.5, 2.6.0, 2.7.0 Priority: Assignee: Votes: Critical Oliver Heyer 0 Not Specified Not Specified Closed Closed Previous Issue Keys: Description While loading test data, I found that when I tried adding members to more than one section in a single thread, changes would only show up for one section. For example, if I tried adding 70 students to "Disc 1" and 70 students to "Disc 2", at the end of the run, "Disc 1" would have 70 members but "Disc 2" would have 0 (or vice versa). Tomcat logs, the Sakai events table, and DB queries while the thread was paused all seemed to indicate that the updates were actually progressing, but then were somehow rolled back. I've tracked the problem down to two ThreadLocal caches used by: sections/sectionsimpl/sakai/impl/src/java/org/sakaiproject/component/section/sakai/SectionManagerImpl.java Details in comments. Comments Comment by Ray Davis [ 25-Feb-2009 ] The findGroup method gets and caches a site Group object from SiteService findGroup. That object includes a reference to a Site object. The getSite method gets and caches a different copy of the Site object, which may in turn end up getting references to different copies of Group objects in the SectionManagerImpl cache. First call to addStudentToSection: 1. Gets and caches a site Group object for the section (with its own copy of the Site object). 2. Calls dropEnrollmentFromCategory, which... 2.a. Gets and caches a new copy of the Site object, which ends up with its own cached copies of the groups. 2.b. Always updates every group's membership list with what it thinks the current state is, whether the student being dropped is already a member or not. 3. Calls group.addMember, which won't make any changes to the cached site's copy of the groups. 4. Calls SiteService saveGroupMemberships to save the state from the cached group's point of view. Second call to addStudentToSection, specifying a different student and a different section: 1. Gets and caches a site Group object for the new section (with its own copy of the Site object). 2. Calls dropEnrollmentFromCategory, which... 2.a. Fetches the cached copy of the Site object, containing the obsolete copies of the groups. 2.b. Updates the first section's group's membership list with the incorrect (empty) cached state, undoing the first call to addStudentToSection. 3. Goes on to deal with the second section, leaving the first section rolled back. Comment by Ray Davis [ 25-Feb-2009 ] One easy (if not necessarily optimal) fix is to clear the cached site object when saveGroupMemberships is called. It might also be a good idea to cut down on unnecessary noise by changing dropEnrollmentFromCategory to ignore groups which don't contain the specified enrollment. We'd need to make sure the apparent redundancy wasn't put in for some undocumented reason, though. Comment by Ray Davis [ 25-Feb-2009 ] This was triggered by the caches implemented for the performance issue . The basic design flaw was that the two caches end up referring to different copies of the same objects. We need to ensure that the same objects are reached no matter how we reference them. This may make the code more complicated, but it should fix the bug. Josh doesn't remember any reason for dropEnrollmentFromCategory to be so exuberant with its updates, and there are no comments to justify it, so I'll change that too. Comment by Ray Davis [ 25-Feb-2009 ] I just committed a fix (I hope) to our local 2.5.x-based branch. I'm no longer seeing the functional bug. The dropEnrollmentFromCategory change sped up my test case (distributing 140 students across 3 sections) from 2 min 18 sec to 42 sec. I'll look into merging to trunk tomorrow. Comment by Stephen Marquard [ 25-Feb-2009 ] Ray, is there any UI action in Section Info that would trigger this bug, or does it only show up when calling SectionManager from other code? Comment by Ray Davis [ 26-Feb-2009 ] So far I haven't found any code paths in the Section Info tool that would trigger the functional bug, which explains why no one's complained about it so far. The performance bug may be another matter. Comment by Stephen Marquard [ 26-Feb-2009 ] The most critical performance issue for us has been the student options (signup, switch) which doesn't have dropEnrollmentFromCategory in the code path. We haven't noticed other specific performance issues on the instructor/TA options (not to say this wasn't an issue), but they're used much less frequently, so have smaller total impact. Good to have it fixed though. Comment by Ray Davis [ 26-Feb-2009 ] A straightforward sections> svn merge -c 58022 https://source.sakaiproject.org/svn/msub/berkeley.edu/bspace/sections/sakai_2-5-x . seemed to work fine in my brief testing, so I've checked this into trunk (rev. 58044) and recommend it also be considered for 2.5.x and 2.6.x. Comment by Stephen Marquard [ 20-Mar-2009 ] Notes for QA: this isn't exposed in the UI, so QA testing should focus on regression testing for existing functionality. Comment by Peter Peterson [ 16-May-2009 ] QA SUMMARY - regression tested existing functionality per Stephen Marquard's Jira notes for QA functionality seems to perform as expected QA PASS Comment by Jean-François Lévêque [ 18-May-2009 ] Has regression testing been done on 2.7? AFAICT, this isn't in 2.6 yet. Comment by Anthony Whyte [ 18-May-2009 ] merged 2.6.0, r62590. Comment by Jean-François Lévêque [ 19-May-2009 ] 2.5.x merge r62592 Comment by Anthony Whyte [ 22-Jun-2009 ] Merged to 2.5.5 branch for skai-2.5.5-rc01. Comment by Peter Peterson [ 28-Sep-2009 ] In 2.6.0; removed 2.6.x as fix version in order to assure a clean 2.6.1 filter. Generated at Sun Mar 06 04:15:10 CST 2016 using JIRA 6.4.11#64026sha1:78f6ec473a3f058bd5d6c30e9319c7ab376bdb9c.