MATLAB vs. Alzheimer’s Support: A use of Netscan to Compare Two Communities Christina Pikas, College of Information Studies, University of Maryland, College Park Abstract —I used the Microsoft Research Community Technology Group Netscan (http://netscan.research.microsoft.com) and UsenetViews tools and Google Groups (http://groups.google.com) to examine and compare two Usenet groups: comp-soft-sys.matlab and alt.support.alzheimers. Based on my investigations I describe what Netscan does and does not tell the researcher. I. MATLAB USER GROUP Comp.soft-sys.matlab A. Description MATLAB is programming software used by scientists and engineers to do complex calculations and modeling. There are many how to books, but the software has the reputation of being very difficult to learn and use. The developer, The MathWorks, maintains a community page (http://www.mathworks.com/matlabcentral/) where it allows customers to share code, participate in contests, and read the comp.soft-sys.matlab (CSSM) newsgroup. Staff development and technical support engineers regularly participate. There are about 10,000 members and about 70,000 messages per year. Posts contain code, error messages, and requests for assistance. B. Who are the members of comp.soft-sys.matlab? The members of CSSM are professional engineers and scientists who use MATLAB at work. They post most heavily during the week and ask sophisticated but succinct questions. The average length of posts for 2004 was 26 lines. Many other posters in addition to The MathWorks employees use work addresses from education, government, and industry to identify themselves. Of the top 40 authors for the first quarter of 2005, 40% only replied and did not start any threads. I calculated the newsgroup crowd information for the first quarter of 2005 and provided a chart below (See Chart 1). Posts are rarely cross-posted to other groups. C. Interesting things From my previous research on online engineering communities1, I found that communities that had too high a load of students asking for basic help were less likely to be successful. This community is active and appears attractive for students with MATLAB assignments. Additionally, there are complaints about student users2. My initial thoughts were: Student posts are ignored The ratio of students to professionals is low My previous findings do not apply to this group; that is, student questions do not adversely impact this online engineering community. I looked at the number of unreplied messages (URM) as a percentage of the total number of messages for October (prime homework season), July (summer break), and January (winter break) 2000-2005. This does not take into account other things that might impact replies such as vacations, winter storms, and conferences. See Table 1 for a comparison of the total number of posts compared to the number of URM. I found no significant differences in the number of URM based on the month of the year. Likewise, the number of one-time posters is not significantly different depending on the month (See Table 2). By scanning the most recent month’s activity, I found several student messages that received meaningful replies. Student posts are not ignored but the load of students to professionals is low and the basic questions do not impact the success CSSM. Also, contrary to the opinion that USENET is dying, this group has shown continuous growth from the first date archives are available in Google Groups. See the totals for each month on the about page (http://groups.google.com/group/comp.soft-sys.matlab/about). The growth is shown in Charts 2, 3, and 4 below. II. ALZHEIMER’S SUPPORT GROUP alt.support.alzheimers A. Description From the National Institute on Aging, “Alzheimer's disease is the most common form of dementia among older people. It involves the parts of the brain that control thought, memory, and language.” (http://nihseniorhealth.gov/alzheimersdisease/defined/01.html, accessed 11/16/2005) This group is a public group to support people with the disease and caregivers. It has about 700 members and 7,000 messages a year. See Chart 5 for a graph of the decline of this group. Posts relay concerns about sick family members, legal questions about caregivers and wills, and requests for advice and support. B. Who are the members of alt.support.alzheimers? In contrast to the MATLAB group, the members of this group do not generally post from work, but from personal addresses with AOL, hotmail, yahoo, etc. domains. There are many more offtopic messages and a few spammers. The average message line length for 2004 was 45 lines -quite a bit longer than CSSM. The majority of the posters identify themselves as women caregivers to sick family members and spouses. C. Interesting things I selected this group to be able to use the USENETviews software provided on DVD-ROM from Microsoft Research. I immediately located a spammer using the Newsgroup Crowd visualization. See the Newsgroup Crowd and the AuthorLines for the spammer in the charts below (6 and 7). It is interesting that trolls were more apparent in a health-related group – one where the activity could be more hurtful. The surprisingly high number of URM (28% of all messages) is explained by looking at the subjects. The three on the first page that were not obviously spam, were thank you messages for assistance received. III. FUTURE WORK AND WHAT NETSCAN DOES NOT TELL US The MATLAB group has grown continuously while the Alzheimer’s group has not. This could be due to competing online communities. Further work would include surveying available Alzheimer communities, comparing the growth/decline over time, and track the migration of the members. Interviews might explain why these people moved. Future work might include interviewing participants to see why they join the MATLAB group. Many do not show membership in other USENET groups. Do they trust this group more because MathWorks employees answer questions? Why do they post instead of calling technical support or asking colleagues? How much research do they do before posting a question? Netscan gathers more statistics in one place than are available on Google Groups or elsewhere. Together with UsenetViews, it provides visualization tools to help the researcher find patterns and explain the usage of the group. It does not, however, support more qualitative investigations. Mining the posts for concepts has to be done via Google. It would be helpful if Netscan or UsenetViews supported searching and compiling data mined from the content. Also, the numbers do not match Google’s so there are discrepancies that should be addressed. NOTES Christina K. Pikas, “Fostering Collaboration in Engineering Communities of Practice: What Works and What Doesn’t” October 20, 2005. Forthcoming on the Communities of Practice Learning Center website. 2 One thread with these complaints includes the following two posts: 1 Date: Sun, 4 Apr 2004 09:02:47 -0500 Subject: ATTENTION STUDENTS POSTING TO COMP.SOFT-SYS.MATLAB Begging for help doesn't work here. Most of the respondents here are paid professionals who are not looking to teach you in a few posts a subject that you have failed to learn in a semester of classes. Nor are we in the business of providing off-the-shelf solutions to your programming problems. The urgency of your problem does not apply to us and your gratitude has little value. Your gratitude might have some value someday if you learn to be a competent engineer, scientist, or whatever you are doing... but that won't happen if your strategy is to try to get someone else to do your work. … Date: Sun, 4 Apr 2004 12:52:46 -0500 Ditto [deleted] response, Us. Personally, I'm getting very tired of seeing students abuse this ng. Just stopped myself from posting a nasty (-ish) reply to another this morning. The problem is, it's just TOO DAMNED EASY for lazy students to get answers to their questions here. No thought required. I'm less and less inclined these days to bother replying at all. IV. TABLES Table 1: Number of Unreplied Messages in Octobers and Julys, 2000-2005 URM Total Messages % URM Oct-05 896 6240 14% Oct-04 781 6260 12% Oct-03 447 3763 12% Oct-02 406 3284 12% Oct-01 345 2600 13% Oct-00 167 1972 8% Jul-05 Jul-04 Jul-03 Jul-02 Jul-01 Jul-00 URM 858 724 353 355 255 159 Total Messages 5766 5966 3145 2656 1807 1635 % URM 15% 12% 11% 13% 14% 10% Jan-05 Jan-04 Jan-03 Jan-02 Jan-01 Jan-00 URM 666 438 399 295 150 119 Total Messages 6659 4097 3153 2909 1478 1047 % URM 10% 11% 13% 10% 10% 11% Table 2: Comparison of the percentage of one-time posters 1x Posters Total People % 1x posters Oct-05 1206 2054 59% Oct-04 1323 2152 61% Oct-03 773 1332 58% Oct-02 688 1172 59% Oct-01 551 947 58% Oct-00 349 621 56% 1x Posters Total People % 1x posters Jul-05 Jul-04 Jul-03 Jul-02 Jul-01 Jul-00 1102 1127 559 570 430 311 1880 1926 1033 1000 718 552 59% 59% 54% 57% 60% 56% Jan-05 Jan-04 Jan-03 Jan-02 Jan-01 Jan-00 1x Posters 1283 840 615 514 341 283 Total People 2133 1373 1103 927 550 431 % 1x posters 60% 61% 56% 55% 62% 66% V. CHARTS Chart 1: Manually calculated Newsgroup Crowd Newsgroup Crowd comp.soft-sys.matlab for 2005 Q1* 90 Days Active in Quarter 80 70 60 50 40 30 20 10 0 0.75 0.95 1.15 1.35 1.55 1.75 Avg Posts per Thread *I reviewed the pieces that MSR CTG used to automate a newsgroup crowd chart for the DVD and manually found that information for the top 20 authors. Author's lifetime participation was retrieved from Google Groups using an author search. Chart 2: Total People per Month in comp.soft-sys.matlab 2500 2000 1500 1000 500 Ja n00 Ju l-0 0 Ja n01 Ju l-0 1 Ja n02 Ju l-0 2 Ja n03 Ju l-0 3 Ja n04 Ju l-0 4 Ja n05 Ju l-0 5 0 Chart 3: Total Messages per Month in comp.soft-sys.matlab 7000 6000 5000 4000 3000 2000 1000 05 nJa 04 nJa 03 nJa 02 nJa 01 nJa Ja n- 00 0 Chart 4: Number of Posts in comp.soft-sys.matlab 7000 6000 5000 4000 3000 2000 1000 Ja n05 Ja n04 Ja n03 Ja n02 Ja n01 Ja n00 Ja n99 Ja n98 Ja n97 Ja n96 Ja n95 Ja n94 Ja n93 0 Chart 5: Number of Posts in alt.support.alzheimers 2500 2000 1500 1000 500 Au g05 Au g04 Au g03 Au g02 Au g01 Au g00 Au g99 Au g98 Au g97 0 Chart 6: Spammer, many posts per thread, few days activity Chart 7: Author lines for user 42214030 for 2004 in alt.support.alzheimers. Note: high numbers of posts on single dates with few posts in each thread. Further examination shows these posts to be spam