MySpace comments - Statistical Cybermetrics Research Group

advertisement
Page 1 of 17
MySpace Comments1
Mike Thelwall
School of Computing and Information Technology, University of Wolverhampton, Wulfruna
Street, Wolverhampton WV1 1LY, UK. E-mail: m.thelwall@wlv.ac.uk
Tel: +44 1902 321470 Fax: +44 1902 321478
Purpose - The public messages exchanged between friends in social network sites provide a
record of informal communication on an unprecedented scale and, in some countries, for a
wide cross-section of the population. This study investigates the characteristics of social
network comments to give a broad overview to serve as a baseline for future research.
Design/methodology/approach - English comments from a representative sample of public
MySpace profiles are examined with a collection of exploratory analyses, using automatic
data processing, quantitative techniques and content analyses.
Findings - Comments are normally for general friendship maintenance and are typically
short, with 95% having 57 or fewer words. They contain a combination of standard spelling,
apparently accidental mistakes, slang, sentence fragments, “typographic slang” and
interjections. Several new creative spelling variants derived from previous forms of
computer-mediated communication have become extremely common, including u, ur, :),
haha, and lol. The vast majority of comments, 97%, contain at least one non-standard
language feature, suggesting that members almost universally recognise the informal nature
of this kind of messaging.
Research limitations/implications - The investigation only covers MySpace and only
analyses English comments.
Practical implications - MySpace comments should not be written in, or judged by, standard
linguistic norms and may cause special problems for information retrieval.
Originality/value - This is the first large-scale study of language in social network
comments.
Introduction
The public messages exchanged by social network site members, sometimes called comments
or wall postings, are a new type of text-based communication. These messages are unusual in
that they are public - either world-visible or visible to all of a members’ friends - and can be
permanently associated with the identity of the poster – more directly and publicly so than
listserv postings. The widespread use of social network sites in many countries (boyd, &
Ellison, 2007) makes them an important object of study and also gives an opportunity to
investigate informal interpersonal communication on a larger scale than previously possible.
Earlier forms of computer-mediated communication for interpersonal or informal
communication have previously been investigated – typically with a case study approach or a
potentially unrepresentative sample due to the limitations of the technology. These studies
have shown the emergence of many forms of non-standard English and distinctive stylistic
features (reviewed below). In addition to the intrinsic linguistic interest of these phenomena,
online information retrieval can be impacted because if social network sites have casual
language and spelling errors, then this could make them difficult to search effectively (see
also Baron, 2003) and difficult to automatically translate (Climent, Moré, Oliver, et al., 2007).
Moreover, if social network profiles are typically not rich in useful information, then search
engines might wish to allocate low search rankings to them, and a convenient automatic
mechanism for this would be to penalise slang or incorrect spelling.
This article focuses on English language comments in one social network site,
MySpace, using an exploratory set of predominantly quantitative analyses. The choice of
MySpace is due to its popularity, apparently being the most visited site for U.S. web users at
the start of 2007 (Prescott, 2007), and because of its amenability to quantitative analysis
(Escher, 2007). The analysis includes text length, common words, spelling, grammar and rare
1
Thelwall, M. (2009, to appear). MySpace comments. Online Information Review, 31.
Page 2 of 17
words. It is an initial exploratory study to highlight issues and patterns for future in-depth
investigations.
Language in Computer-Mediated Communication
Language and CMC types
Perhaps the key issue for early online language researchers was the degree to which Internet
language is similar to spoken rather than written language (Baron, 2003; Crystal, 2006).
Previous findings are ambiguous: its linguistic features can fit between the two (Ko, 1996 –
an educational chatroom) or can be different from both (e.g., modals in Yates, 1996 – a
student-oriented discussion forum). Similarly, a study showed that language in an
international Bulletin Board System (BBS) covering a mixture of recreational and serious
topics tended to be more informal than most written forms and was quite similar to the
language of formal interviews, but with a higher degree of abstract information (Collot &
Belmore, 1996). The problem with attempts to generalise from such studies is that “internet
language” is too broad a category and a more nuanced approach is needed (Herring, 2002).
There are many different computer mediated communication (CMC) modes (Baron,
2003), including email (one-to-one, asynchronous), instant messaging (one-to-one,
synchronous), blogs (one-to-many, asynchronous), live streaming broadcasts (one-to-many,
synchronous), chat applications (many-to-many, synchronous), and listservs or wikis (manyto-many, asynchronous). Online CMC also varies in the extent to which it is product-oriented
or process-oriented (Baron, 2003). Those with more durable outputs (e.g., blogs) probably
tend to use more carefully chosen language whereas those with less durable outputs (e.g.,
chat, instant messaging) are more oriented towards the process the users are engaged in and
the use of casual language may be more appropriate.
Since CMC services vary in their capabilities and usages, it is useful to have
dimensions through which to compare and analyse their language. In particular it is important
to recognise that Internet language is not homogeneous, but is socially constructed by users
appropriating available technologies (Androutsopoulos, 2006). For example, although similar
kinds of messages are possible with instant messaging and mobile phone text messages, they
are integrated in different ways into people’s lives because of their differing conveniences
(Grinter, Palen, & Eldridge, 2006). Herring’s (2007) faceted classification scheme
summarises a wide range of factors that may influence the language used within a particular
CMC context, distinguishing between medium and situational types. Partially quoting from
Herring (2007), the medium factors include: synchronicity (asynchronous/synchronous);
persistence of transcript (how long the record of the communication is likely to survive);
maximum permitted message length; and whether the messages are private or anonymous.
The situational factors are more complex and often have to be evaluated qualitatively. Again
partially quoting from Herring (2007), situational factors include: participation structure (e.g.,
one-to-one or one-to-many; group size; the number of active participants); participant
demographic, attitudinal, skills-based and other characteristics; purpose of communication
environment; and the topic or theme of group or messages. As a consequence of the research
summarised in Herring’s classification scheme, it is important to note that there are many
different factors that influence the kind of language used in online communication, or even in
any given type of online communication.
CMC language variations and innovations
It seems that when new communication technologies arrive, there is a burst of creativity as
users develop new styles and patterns of use (e.g., Danet, Ruedenberg, & Rosenbaum-Tamari,
1997; North, 2006). CMC has seen the introduction or expansion in the written use of slang,
emoticons, and abbreviations. Some abbreviations, such as irl (in real life) seem to be specific
to the Internet, and others seem to have spawned from the functions of a CMC device. An
important motivation for use of abbreviations by people in devices for which they are not
convenient may be to show group membership or conformity (Crystal, 1997, in Baron, 2003).
Page 3 of 17
Mobile phone (cellphone) text messages and instant messaging (IM) seem to promote
independence in teenagers, and innovative styles of use are likely to emerge amongst
adolescents since these are known leaders of linguistic change (Eckert, 2003). For the general
population, text messages seem to be used for a variety of purposes, but asking questions and
transmitting personal information seem to be two of the most common uses (Faulkner &
Culwin, 2005).
The types of abbreviations commonly used in text messages include: dropping one or
more letters from words, phonetic spelling, and using symbols or numbers for sounds –
letter/number homophones (e.g., @, h8) (Grinter & Eldridge, 2003). Many of the shortenings
seem to be ad-hoc, created just to speed the writing of the message. Other shortenings include
contractions that remove all or most vowels, clipping the final ‘g’ of words, and strings of
initial letters of the words in standard phrases (e.g., bfn = bye for now) (Thurlow, 2003). In
addition to shortenings there are creative non-standard spellings, such as those designed to
portray an accent (e.g., wiv for with) or as a humorous alternative (e.g., lata for later)
(Thurlow, 2003). Humorous spellings have also been noticed in dating chat rooms (del-TesoCraviotto, 2006) and misspellings have a long tradition of pre-CMC use in comic literature
(e.g., Kline, 1907). Thurlow’s (2003) analysis also found that a few (student) text messages
were obscurely encoded to the extent that they were incomprehensible to his researchers.
Language innovations other than spelling also occur. For example, within multi-user
environments, like chatrooms, a convention has emerged to preface a comment with the name
of the intended recipient (Werry, 1996). Similarly, short and fragmented sentences can also be
common in chatrooms (Radić-Bojanić, 2006). Herring (2001) emphasises that CMC typing
variations should not be seen as mistakes but as natural adaptations to the affordance of
particular devices, services or contexts. For example the avoidance of capital letters can speed
typing in a rapid synchronous workplace exchange (Murray, 1990, cited in Herring, 2001).
Variations can also be optimised for expressiveness rather than speed. Examples include
written descriptions of sounds, such as laughing or crying, and the use of repeated letters
(MacKinnon, 1995, cited in Herring, 2001). One of the most complete lists of standard
variations in CMC text is that of Anis (2007) for French mobile phone text messages. In
addition to the most of the examples mentioned above having French equivalents, there are
many more variants. Some are language-specific, such as the omission of accents for letters
and the substitution of k for qu, and others are more general such as merging consecutive
words. Anis emphasises that although some of the spelling variants are positively cryptic,
they are expressive and playful, making them apparently effective communication in context.
Social network language
Social networks sites are online environments that typically let registered members set up a
personal home page, add their own content, and invoke ‘friend’ connections with other users.
These friend connections are normally two-way with each friend having a picture of the other
on their profile page or friend list pages. Social network sites like MySpace and Facebook
seem to be particularly popular amongst younger users, and to have become near-ubiquitous
amongst some groups, such as U.S. students (Golder, Wilkinson, & Huberman, 2007).
Social network sites are not homogeneous: they have different online environments
and user groups. For instance, Facebook originates from within education and seems to have
more educated users then MySpace (boyd, 2007). Some sites with social network features
target a particular activity, such as news discovery for digg.com (Lerman, 2006).
Young members of social network sites like MySpace seem to use them primarily to
communicate within existing friendship groups rather than to make new friends (boyd, 2008)
or to flirt (Pew Research Center for the People & the Press, 2007), although there are many
varied uses of social network sites (e.g., Fono & Raynes-Goldie, 2006). Many users have
hundreds of ‘friends’ – which may be predominantly acquaintances or strangers (Thelwall,
2008b) – but the majority of interactions probably occur amongst offline friends (Golder et
al., 2007).
Page 4 of 17
Social network sites appear to be integrated into the daily lives of their users (Kim &
Yun, 2007) rather than having a separate partitioned existence. In fact, social network sites
can be important arenas in which to express personal identities (boyd & Heer, 2006).
Although multiple communication modes are typically supported, such as blogs, pictures,
email, instant messaging, and video, one distinctive way of communicating is to write
comments on a friend’s profile page (i.e., their default main page in the social network site).
Comments are an interesting communication phenomenon because they are public – either
world-visible or visible to all of the recipient’s friends. Comments are described differently
by site, for example testimonials and wall postings are alternative names, and some sites also
allow comments about blog postings, pictures and videos. Nevertheless, public conversations
between friends by writing on each other’s profile page seem to be very common. The public
nature of these comments makes them amenable to researchers who can access and analyse
those that are not restricted to the owner’s friends. The relatively permanent nature of social
network comments makes them a potential to threat to orthodox standards of language use
because members have, in theory, the space and time to take care with their comments and so
the creators have little defence against accusations of linguistic “sloppiness”.
Although no previous research seems to have analysed grammar and spelling in
social network sites, some have discussed language use. One study analysed language
overlaps in LiveJournal (Herring et al., 2007), showing the existence of multiple language
communities, although sometimes bridged by multilingual individuals or journals with
extensive non-text content. Other studies have analysed swearing in MySpace, showing that it
is very common - occurring in around a third to half of teen MySpaces (Hinduja & Patchin,
2008; Thelwall, 2008a). The prevalence of swearing indicates that social network language
can be highly informal.
Research Questions
This study investigates the comments found in the “Friends Comments” section of MySpace
profile pages, which typically contains a set of text messages written by friends (although
some comments contain images and some seem to be written by spam bots that have accessed
a friend’s login information). This is an exploratory analysis using the information-centred
research philosophy (Thelwall, Wouters, & Fry, 2008) of rapid (often shallow) exploratory
analyses of new information sources to highlight potential applications and to develop
appropriate methods to extract useful data (type ICR4 in terms of:
http://cybermetrics.wlv.ac.uk/icr.html). In particular, a key objective is to give a broad
overview to serve as a baseline for future research. The following research questions are
addressed in the analysis, focusing on linguistic aspects.
1. What is the topic or purpose of typical comments?
2. What is the median length for comments?
3. Are rare words or spellings more frequent than rare words in standard written
English?
4. Are any non-standard spellings common?
5. Are there any common types of non-standard word spellings and words?
6. What proportion of MySpace comments avoids all instances of non-standard English?
Research Design
The overall research design was to download the profile pages of a large random sample of
MySpace users, to extract a random sample of English comments from these pages, and then
to address the research questions with this data.
Data
A sample of MySpace comments was created for analysis via the member ID feature. Each
MySpace member has an ID which uniquely identifies their profile page URL and can be
used to deduce their joining date. A random sample of 30,000 URLs was chosen and
automatically downloaded on July 17, 2007 using SocSciBot 4 (socscibot.wlv.ac.uk),
Page 5 of 17
representing profiles that were created on July 3, 2006. From this collection the following
were rejected:
 Members with 0 or 1 friends (unlikely to be real users)
 Members with private profiles (comments not available)
 Members registered as musicians, film-makers or comedians (not typical users)
All remaining profiles were processed to extract all profile page comments, i.e. the
most recent up to 50 comments, a total of 173,730. All comments that were either only
pictures or were from a small set of standard commercial spam message types were
automatically removed. A random sample of 8,000 out of remaining 149,913 comments was
then manually checked to filter out any remaining spam, as well as to remove any nonEnglish comments and any viral messages (e.g., with an instruction to forward the comment).
It is not fully possible to separate English from non-English comments since code-switching
is a recognised phenomenon in online communication (e.g., Axelsson, Abelin, & Schroeder,
2007; Lee, 2007; Siebenhaar, 2006). The final set of comments for analysis consisted of
6,859, containing a total of about 95,000 words.
Methods
The topic or purpose of MySpace comments was investigated through an informal content
analysis by the author of a random sample of 200 comments, excluding spam, viral and nonEnglish comments. The analysis is subjective because the comments are often short, part of
longer exchanges, and may be decoded by the recipients in ways that the author does not
understand. Thurlow’s (2003) SMS text messages categories are used as a baseline because
SMS messages are also short messages between friends. Anonymised and sometimes
truncated examples are given to illustrate the findings.
To measure comment lengths, all HTML tags were removed from each comment and
the number of characters in the remainder was measured. The comments were then split into
separate words by dividing each comment at whitespace markers (single or multiple: spaces,
tabs, and/or line ends) or punctuation (except hyphens or apostrophes within words). The
number of resulting “words” in each comment was then counted.
For the third research question, a word frequency distribution for MySpace comments
was calculated by tallying the frequency of all “words” found in the comments, as described
in the first paragraph of this section, after converting all capital letters to lower-case. For the
fourth research question, a table of the most frequently occurring words was produced for
comparison with similar tables for British and U.S. English.
For the fifth research question, a set of 400 words occurring only once in the
collection of 6,859 comments was investigated to get a sample of rare words. This is an
artificial sample and the proportions of different types of words are not meaningful. A larger
comment sample would probably have included a lower proportion of correctly spelled
words. This is based upon the assumption that incorrectly spelt words are less likely to be
repeated than correctly spelt words. For example if all incorrect spellings were unique, then
the proportion of words that were incorrectly spelt would increase linearly with the size of a
corpus, whereas the number of unique words in a corpus normally increases logarithmically
with its size (i.e., at a lower rate), following Zipf (1949). The purpose of the sampling process
is hence only to generate a sample of relatively rare spellings. The words in the sample were
classified by the author using an inductive content analysis: initially grouping the words into
similar sets and then formalising the category definitions and re-categorising the words. The
categories chosen by this process overlap and the results are subjective but serve the purpose
of highlighting a variety of types of rare words.
Finally, to assess the spelling of MySpace comment words and to identify slang, the
6,859 comments were copied into Microsoft Word and its U.S. English spell-checker used as
the primary dictionary. Two coders (the author and a final year linguistics student) classified
each comment for the presence of any or all of: slang or typographic slang (defined as
informal methods of spelling words), punctuation errors, spelling errors, interjections,
pictograms and non-standard uses of capital letters. The frequency of occurrence of these
Page 6 of 17
features in each comment was not recorded: only its presence or absence. In addition, the
classifiers judged each comment for following an accepted standard grammatical format,
using their own knowledge of the rules of grammar. Here “grammar” is interpreted as
encompassing all language rules apart from those listed above. Inter-coder agreement was
calculated and in all cases of disagreement the author made the final classification decision
(see Appendix for more details of the scheme). A set of simple automatic analyses were also
conducted using a purpose-built program (available from the author) that read each comment
and produced summary statistics. For instance, one part of the program checked each
comment to see whether it was entirely in lower-case and counted those that were.
Results
Themes
In terms of Thurlow’s (2003) SMS categories, the vast majority of comments (78%) appeared
to be for general friendship maintenance (e.g., Have A Great 4th!; haha keep drinking your
jack, you sick son of a bitch!; hey happy belated; same here i am soo board; TOD LIKES
ICKI PORN ew; Hey, whats up?), rather than for any more practical purpose. The remainder
exchanged some kind of non-trivial information (e.g., I got dat prom video on my page),
arranged external meetings (e.g., You out tonight my dear?) or were (possibly) romantic (e.g.,
I MISS U TRAC!; I LOVE YOU!!!!!!!!!). In contrast to Thurlow’s (2003) text messages,
here were no explicit comments about sex.
Almost half of the comments (43.5%) did not have a clear topic of discussion (e.g.,
hey baby how's it going; You guys are rockers!!; hah thats so me!!) but the main clearly
identifiable topics were: MySpace (11.5%, e.g., ur second top friend is tim happy or what
lol?; Thanks 4 the +; u totally should check out my space lol), birthdays (7.5%, happy
birthday!), and music (7.5%, I love the picture and the song!).
One noticeable feature was creative humour (e.g., Whats Up PROVIDER??! hows
things goin..how was the songfest,, blah blah blah..; Make it back safe and don't be actin
crazy nigga!!!). Although there was only one joke in the set, 25% of the comments appeared
to be humorous in some way (e.g., with lol or :) following a comment, or otherwise judged as
attempting humour; unusual spellings were not counted as humour). Another common
element was an expression of interest in the target of the comment, or someone known to
them. A total of 30.5% of comments contained such requests (e.g., how are you?; what’s up;
are you doing …?; how is …?), although two thirds of these were stock polite greetings like
what’s up, which may primarily serve as salutations or phatic communion (Malinowski,
1923). Finally, love was another common theme. Fully 22% of the comments contained an
expression of love, either through the word love, hugs, hearts or kisses or a variant of miss
you. These seemed to be predominantly expressing friendship (i.e., friendship love) rather
than romantic love (e.g., ha ha miss you cuz; LIL SIS love you!!; jus wanna holla @ you an
show ur page sum luv).
The extent of correct formal written English in comments
The manual checks of 400 random MySpace comments gave the results in Table 1. There was
a high degree of agreement between the classifiers for the categories within the table, as
supported by the Cohen's kappa values (Neuendorf, 2002), which was probably due to the
prescriptive classification scheme. The differences occurred mostly in many cases where
instances of non-standard English could be classified in multiple ways, or where there were so
many non-standard features that judging grammar was difficult. For example the comment:
“wat it do castro,watz with u now these dayz homie” was not coded for “Other non-standard
English grammar” because “wat it do” and “watz with u” were judged to be slang phrases.
Another common type of issue is represented by the comment: “o sorry i dont know the
password no my parents got an email not me”, which could have been classified as having
non-standard grammar due to sentences run together without conjunctions. Instead it was
Page 7 of 17
classified as having non-standard punctuation, assuming that the primary issue was the
absence of all punctuation rather than a non-standard sentence construction.
Table 1. Types of non-standard English found in MySpace comments
Aspect of non-standard English*
Typographic slang or abbreviations (e.g., omg, lol, hugz, @)
Slang, including dialect, swearing, and idiomatic slang sayings
Non-standard spelling other than the above
Non-standard punctuation
Pictograms
Interjections (e.g., haha, muahh, huh, but not oh).
Non-standard capitalisation
Other non-standard English grammar
Not standard formal written English (i.e., Any of the above)
Comments
containing
41%
51%
33%
81%
16%
13%
75%
56%
97%
Inter-coder
Agreement
(Kappa)
94.3% (.882)
88.5% (.771)
91.0% (.789)
95.3% (.829)
99.5% (.981)
98.0% (.913)
99.0% (.973)
91.5% (.824)
99.2% (.866)
*See Appendix for more details of classes.
From Table 1 it is clear that comments entirely in standard formal written English are
extremely rare. Examples of comments judged completely correct include: “Happy
Thanksgiving!”, “I like visitors. Xk” and the possibly facetious “Dear Friend, It says your
birthday is June 20. That must be incorrect! I do believe your birthday is November 30.” The
most common causes of “other non-standard English grammar” were incomplete sentences
and sentences merged together without punctuation or conjunctions. The incomplete
sentences often missed a pronoun (e.g., “Just sayin sup.”) or a (main or auxiliary) verb (e.g.,
“how you been”).
Additional punctuation and capitalisation statistics
This section reports automatic analyses of the 6,859 comment lines judged to be valid nonviral, non-spam and English. After excluding escaped characters and all leading and trailing
white-space characters, all except 2 of the comments were non-null and were automatically
processed for the patterns below to see whether some non-standard language features were
common.
 All upper case: 7.5% (515)
 All lower case: 37.9% (2,600)
 Ending in a valid sentence terminator (full stop, quotes, ! or ?): 49.4% (3,389)
 Starting with a letter of the alphabet: 98.6% (6,760)
 Starting with an upper case letter of the alphabet, if starting with a letter of the
alphabet 50.9% (3,439)
Comment lengths
This section analyses the distribution of comment lengths, as measured in characters and
words (after excluding escaped characters). The median number of words per comment is 14
and the median number of characters per comment is 68. Figure 1 shows the ‘hooked power
law’ shape (see similar graphs (Pennock, Flake, Lawrence, Glover, & Giles, 2002)). There is
probably a basic power law (e.g., Barabási & Albert, 1999) in comment lengths, with shorter
comments being much more common than longer comments. The hook shape at the top left of
the graph shows that very short comments are much rarer than would be expected for a pure
power law. This probably reflects the need to write long enough comments to convey a nontrivial message. Almost all (95%) MySpace comments have 57 or fewer words. Hence,
although comments are sometimes very long, the typical comment is about the length of a
short sentence, and the overwhelming majority are not longer than a few sentences.
Page 8 of 17
Figure 1. Distribution of 6,859 comment lengths (words). Note the log-log scale.
Word frequency distribution
Figure 2 reports the distribution of word frequencies, showing a visually almost perfect power
law. Classic text should illustrate a perfect power law, with a few words being very common
(i.e., having a high word frequency – a point on the right of the graph below) and many words
being rare (i.e., having a low word frequency – a point on the left of the graph below). The
linear fit on the left of the graph is not quite perfect, and the straight line pattern evident for
frequencies 2 to 10 is not matched by word frequency 1, which is higher than the line would
predict. Although the difference is small in size, it is large due to the logarithmic scale. This is
in contrast to similar graphs for British English and academic web sites, for example, in
which the lines are straight and the point for word frequency 1 does not deviate (Thelwall,
2005). This confirms that there are more unique words in comments than in “normal” text.
This cannot be the result of the typical short length of MySpace comments resulting in a high
proportion of unique words in each comment (a high “type/token ratio” in the terminology of
Chafe & Danielewicz, 1987), because, in general, text lengths vary the slope of the line in
Figure 3 but not its overall shape. This suggests that there is an additional process at work,
which could be a force for creative variety in spelling or word choice, or simply extra
carelessness in spelling.
Figure 2. Distribution of word frequencies in 6,859 comments. Note the log-log scale.
Page 9 of 17
Common words
Table 2 reports the most common MySpace comment words, after converting all capital
letters to lower-case. The table highlights words that are not found in the top 100 for general
British English, as calculated from the British National Corpus (Leech et al., 2001), and the
top 100 from general written American English, as extracted from the Brown corpus
(http://www.giwersworld.org/computers/linux/common-words-freq.phtml). Note that the
methods used for the British National Corpus statistics are not quite the same as those here,
and both corpora cover language from at least twenty years ago. In particular, the Brown
corpus word list has apostrophes removed (e.g., don’t -> dont) and the British list splits
compound abbreviated words at apostrophes (e.g., I’m counts as two components, I and ’m).
Hence the comparison is approximate and serves only to draw attention to potentially
significant words.
Several abbreviations are included in Table 2: u and ya for you, ur for your, im
normally for I’m, whats for what’s, dont for don’t. A few non-words are also present, such as
lol (laugh out loud), :) and haha. The digit 2 is often used as a homophone for to or too.
The rank order of the word frequencies seems more similar to spoken than written
English. For example, I is the most frequent word in conversational British English
(Kilgarriff, 1997), as in the British National Corpus (Burnard, 1995) and the second most
common in general spoken British English (Leech et al., 2001), but is only seventeenth most
common in general written British English (Leech et al., 2001) (see descriptions and data at
http://www.kilgarriff.co.uk/bnc-readme.html and http://www.comp.lancs.ac.uk/ ucrel/
bncfreq/flists.html). Nevertheless, there are also clear deviations from spoken English, not
only in terms of spellings and the lack of pause-fillers like er (Leech et al., 2001) but also in
words like love (ranked 555 in spoken British English (Kilgarriff, 1997)), happy (ranked 503
in spoken British English), and miss (ranked 1,052 in spoken British English). These three
word frequencies are probably closer to those of a written genre: letter-writing (Leech,
Rayson, & Wilson, 2001).
Also noticeable in Table 2 are words related to movement (come, go, going, back)
and time (day, weekend) that seem to fit an orientation on small-talk.
Table 2. The most common words in the comments sections. Bold words are not in the top
100 for general British English, and italic words are not in the top 100 for general American
English.
Rank
Word
1-10
i, you, to, the, and, a, u, me, hey, my
11-20 it, for, in, love, is, that, so, up, your, on
21-30 have, of, are, just, lol, but, we, how, be, ya
31-40 at, was, well, what, get, like, good, im, know, out
41-50 been, this, with, see, hope, all, do, not, if, happy
51-60 miss, going, go, time, i'm, ur, back, some, got, there
61-70 when, can, will, thanks, its, or, by, from, now, whats
71-80 say, day, new, hi, much, one, no, about, haha, call
81-90 come, :), soon, too, need, birthday, 2, am, had, here
91-100 dont, doing, as, think, man, page, great, did, weekend, work
Table 3 reports the same frequencies as Table 1, but includes the original case of the words.
The high frequency of the capitalized initial letter words seem consistent with a letter-writing
style, but it is interesting that You, U and YOU appear – perhaps it is logical to capitalise U
since I is a capital letter.
Page 10 of 17
Table 3. The most common words in the comments sections, retaining letter case. Words
including an upper-case letter are in bold.
Rank
Word
1-10
you, to, I, i, the, and, a, u, me, it
11-20 my, for, in, is, that, on, up, your, so, have
21-30 of, are, hey, love, but, lol, be, just, was, at
31-40 we, ya, out, get, Hey, know, like, how, well, been
41-50 with, good, see, this, what, all, im, do, not, going
51-60 time, go, hope, back, if, miss, there, will, can, ur
61-70 got, some, when, or, its, from, say, U, by, :)
71-80 now, about, 2, much, haha, You, Love, one, soon, call
81-90 come, need, new, too, am, I'm, whats, had, doing, day
91-100 no, YOU, think, as, here, dont, man, work, A, really
Rare words
Table 4 reports the classification of a random sample of 400 words occurring only once in the
collection. Whilst Table 4 includes many valid words and numbers, there is evidence of a
systematic pattern of new word creation and deliberately made-up spellings. Several new
common practices are evident: substituting numbers for similar-looking letters; truncating
words; lengthening words by repeating letters; phonetic spellings; and substituting z for s.
Two of these patterns, repeated letters and interjections, seem to be devices to
emphasise the importance of words or to convey emotion (as with emoticons). Giving
emphasis is also an important function of swearing (Jay, 2000).
Page 11 of 17
Table 4. Classification of 400 words occurring once.
Type
Correct spelling
other than proper
nouns
Name or other
proper noun
Definition
Non-slang word found
in dictionary (including
standard grammatical
variations)
Identified as such
through personal
knowledge or web
searches
Number or code
Non-English
Apparent typo
Identified as a nonnoun dictionary word
in a non-English
language
Judged a small spelling
variation of a
recognised word.
Number
150
75
kontiwa, jens, ap, chc, andreas, bridi
14
651, 8850, 3y, 2am, 8888888888888880,
7772, 89, r34, 808
prego, vhiida (vida), bleu, musica, pelo,
interesante
6
49
Two words with
merged spelling
Two words normally
written separately
Slang or madeup word
Non-dictionary word or
described as slang in
dictionary; not used as
a proper noun.
23
Deliberately
made-up spelling
Judged a large spelling
variation of a
recognised word or part
of a systematic spelling
variation pattern.
Spelling variant of
existing word with at
least one extra repeated
letter
28
Judged to be describing
a vocal sound
13
Word
with
repeated letter
Interjection
Unknown
Total
Examples
9
31
2
400
copyed, sumit, riends, doign, frend,
experance, tomarow, materal, miester,
internt,
crys,
andress,
manillow,
chrismtas, privlaged, punkin (pumpkin),
valintines, arund, dewin (doing), roomate,
cousinn, aout (about), visiiting, cann,
encuragement, mixs, bearly, sappose,
freinds, centery, dosent, appreaceate,
apreciation, rteeth (teeth), locuacious,
dosnt, cheak, skys, layed, sigle, goos
(good), whent, destraction, biusines,
earler, gpoing, wak
babymamma,
thankya,
yawhy,
soundfreak, carcrash, lotsa, whatchu,
coinslot, yeadat
dokie, scaggy, dangit, hotty, aght
(alright), alreet, oik, mangina, yute,
cracka, badboii, gawd, yids, numnuts,
croc, favs, wuddup, roxes (rocks),
mcwalmartenheimer,
picy,
goina,
evrythang
pt (put), finishin, styllz, n0t, deyz, nathin
(nothing), l3t (let), getin, slidin, startz,
altho, wurd, seri0us, c0de, choon (tune),
reazun, a0l, hoez, mackin (making), 5tyll,
sayz, bos (boss), niger
myspacee, byeee, chiick, weirdddd, duhh,
soooooooo, sleeep, misss, loveee,
nobbbin, joyceeeeeeeeeeeee, sweeeet,
loove , gwaannnin, hiiigh, beee, crazyyy,
helllooooooooooooooooooooo, herrrrrre,
goooodd, annnnt, okayyyy, souuulful,
weeeeeel,
meliiiiiissssssaaaaaaaa,
homelesssssss, livee, congratss, happpyy,
moneyyy, geeetair (guitar)
boohoo,
wuhu,
heheh,
muhahahahahahahahahahaha,
awwwwwwwwwww,
mmmuuuaaaahhhzzz, teeheee, muahzz,
whew,
aahahahahahahahahahaaha,
awwwwwww, bya, hahahhah
blac, tc
Page 12 of 17
Discussion and limitations
The results give some answers to four research questions exploring MySpace comment
language. First, the “normal” length for MySpace comments seems to be about 14 words (for
comments that contain at least one word) and almost all comments are not longer than a few
sentences. Hence, comments are typically brief communications.
The distribution of word frequencies differs from standard English in the sense that rare
words or spellings are more frequent than would be expected for a pure power law. This
confirms the casual observation that typos, slang and innovative spellings are a significant
part of MySpace commenting. A few non-standard spellings were common enough to be in
the top 100 for MySpace comments. These include abbreviated spellings, abbreviations,
pictograms, and interjections. It will be interesting to see if any of these become accepted as
recognised alternative dictionary spellings because of their widespread use. There were some
patterns in the types of rare non-standard word spellings and words used in MySpace. The
main patterns were the use of repeated letters, probably for emphasis or playfulness, and the
typing of interjections, typically expressing emotion.
Although MySpace comment text seems intuitively closer to spoken than written
English, and this is backed up by the prevalence of the personal pronouns you and I, there are
many features, including those discussed above, that make MySpace comments a clearly
distinctive variety. There is also a technical problem for comparing MySpace text to spoken
language, which is that spoken language has to be transcribed, and this transcription
necessarily uses correct spellings, phonetic spelling, or a system based upon pronunciation for
all recognisable words used. Hence, it is difficult to quantitatively compare the two,
especially due to the prevalence of contractions using apostrophes (often omitted in MySpace
comments) such as ’s and n’t in spoken English. In contrast, the non-standard spelling and
grammar probably make MySpace comments clearly distinct from similar written forms, such
as the personal letter.
The vast majority of MySpace comments do not exclusively use formal written English
(grammar and spelling; a lack of slang), and various types of non-standard spelling are
prevalent. Comments written entirely in formal English are hence likely to stand out as
inappropriate, which may detract from their message or stigmatise the commenter as
abnormal or inexperienced. In contrast, a range of “incorrect” styles, such as lower-case
messages, are common.
This research has limitations in the extent to which it can be generalised to other social
network sites and languages. It would not be reasonable to hypothesise that the patterns of
spelling and language use in MySpace would also be found in all other social network sites.
Facebook comments (i.e., wall posts) may have fewer spelling errors and creative spellings
because of Facebook’s tendency to have more educated members (boyd, 2007) using it in an
educational context (Golder et al., 2007). Similarly, it seems likely that different spelling
patterns may emerge amongst different language user groups. In Chinese, for example,
repeated characters (logograms) may not always make linguistic sense - although ASCII
characters are sometimes used anyway for speed (Lee, 2007). Nevertheless repeated
characters representing useful adjectives have been identified in online Chinese, such as 漂漂
(beautiful-beautiful, Yang, 2007). In contrast, in Japanese it seems that similar emphasis may
be gained instead by the use of additional or alternate symbols designed for expressiveness
(Nishimura, 2007).
Conclusions
MySpace comments are typically informal and often creative and fun. Their language seems
to diverge from formal written English because of the need to convey meanings that are
difficult to communicate quickly in standard written forms, for example expressing emotion
or emphasis. MySpace comments written entirely in formal English are rare. They contain a
combination of standard spelling, apparently accidental mistakes, slang, sentence fragments,
typographic slang and interjections. Several new spellings have become commonplace,
including u, ur, :), haha, and lol. Although some codes of practice are developing, it seems
Page 13 of 17
unlikely that a new formal MySpace grammar will emerge because of the playfulness with
language evident in many of the deliberately incorrectly spelt words (e.g., one commenter
doubled every letter i in each word).
The variety in grammar and spelling poses new challenges for future research in
natural language processing and information retrieval because the former, and the latter to
some extent, relies upon regular predictable patterns in text in order to effectively process it.
For example one common application of natural language processing is in automatic
translation. Whilst this would be useful for MySpace comments – perhaps to support bilingual
friendships – specialist research is needed because techniques developed for standard English
are unlikely to work well with MySpace comments. In information retrieval research,
techniques such as latent semantic analysis (Deerwester, Dumais, Furnas, et al., 1990) could
be used to help identify new synonyms for existing words, but all such techniques rely upon
spellings being used frequently enough to form a statistically identifiable pattern. To help
both types of research, it would be useful to extend the findings in the current paper to
different languages, and to different social network sites. It would also be useful to analyse
MySpace comments on a broader scale as grammatical units and also as dialogs between the
profile owners. This should give a wider perspective on the linguistic phenomenon of social
network commenting.
In terms of the wider applications of the findings, social network comments should
not be assessed by employers in terms of formal written English standards because this would
signal members as deviant rather than linguistically skilled, at least in MySpace. The
frequency of non-standard spelling and slang also has implications for web information
retrieval. MySpace is indexed by Google (34.9 million pages reported indexed on June 16,
2008) and other search engines despite it seeming unlikely that many pages would ever be
valuable in search results, with the main exceptions being musicians’ spaces. In particular,
given the predominance of very personal communications in comments, it seems logical that
search engines should take steps to keep them out of search results, for example by allocating
them a low rank. This research suggests that this could be achieved in a generic manner by
penalising the ranking of pages with many non-standard spellings, especially if these are close
to the search keywords.
References
Androutsopoulos, J. (2006). Introduction: Sociolinguistics and computer-mediated
communication. Journal of Sociolinguistics, 10(4), 419–438.
Anis, J. (2007). Netography: Unconventional spelling in French SMS text messages. In B.
Danet & S. C. Herring (Eds.), The multilingual internet: Language, culture, and
communication online (pp. 87-115). Oxford: Oxford University Press.
Axelsson, A.-S., Abelin, A., & Schroeder, R. (2007). Anyone speak Swedish? Tolerance for
language shifting in graphical multiuser virtual environments. In B. Danet & S. C.
Herring (Eds.), The multilingual internet: Language, culture, and communication
online (pp. 362-381). Oxford: Oxford University Press.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science,
286(5439), 509-512.
Baron, N. S. (2003). Language of the Internet. In A. Farghali (Ed.), The Stanford Handbook
Biber, D. (2003). Variation among University spoken and written registers: A new
multi-dimensional analysis. In P. Leistyna & C. F. Meyer (Eds.), Corpus Analysis:
Language Structure and Language Use (pp. 47-70). Amsterdam: Rodopi.
boyd, d. (2007). Viewing American class divisions through Facebook and MySpace.
Apophenia Blog Essay (June 24), Retrieved July 12, 2007 from:
http://www.danah.org/papers/essays/ClassDivisions.html.
boyd, d. (2008). Why youth (heart) social network sites: The role of networked publics in
teenage social life. In D. Buckingham (Ed.), Youth, identity, and digital media (pp.
119-142). Cambridge, MA: MIT Press.
Page 14 of 17
boyd, d., & Ellison, N. (2007). Social network sites: Definition, history, and scholarship.
Journal of Computer-Mediated Communication, 13(1), Retrieved December 10, 2007
from: http://jcmc.indiana.edu/vol2013/issue2001/boyd.ellison.html.
boyd, d., & Heer, J. (2006). Profiles as conversation: Networked identity performance on
Friendster. Proceedings of the Hawai'i International Conference on System Sciences
(HICSS-39,
January
4-7),
Retrieved
July
3,
2007
from:
http://www.danah.org/papers/HICSS2006.pdf.
Burnard, L. (1995). Users' reference guide to the British National Corpus. Oxford: Oxford
University Computing Services.
Chafe, W., & Danielewicz, J. (1987). Properties of spoken and written language. In R.
Horowitz & S. J. Samuels (Eds.), Comprehending oral and written language (pp. 83113). San Diego: Academic Press, Inc.
Climent, S., Moré, J., Oliver, A., Sánchez, I., Taulé, M. & Salvatierra, M. (2007). Enhancing
the Status of Catalan versus Spanish in Online Academic Forums: Obstacles to
Machine Translation. In B. Danet & S. C. Herring (Eds.), The multilingual internet:
Language, culture, and communication online (pp. 209- 230). Oxford: Oxford
University Press.
Collot, M., & Belmore, N. (1996). Electronic language: A new variety. In S. C. Herring (Ed.),
Computer-mediated communication - linguistic, social and cross-cultural
perspectives (pp. 13-28). Amsterdam: John Benjamins.
Crystal, D. (2006). Language and the Internet (2nd ed.). Cambridge, UK: Cambridge
University Press.
Danet, B., Ruedenberg, L., & Rosenbaum-Tamari, Y. (1997). 'Hmmm.Where's That Smoke
Coming From?' Writing, Play and Performance on Internet Relay Chat. Journal of
Computer-mediated Communication, 2(4), Retrieved March 3, 2008 from:
http://jcmc.indiana.edu/vol2002/issue2004/danet.html.
del-Teso-Craviotto, M. (2006). Language and sexuality in Spanish and English dating chats.
Journal of Sociolinguistics, 10(4), 460-480.
Eckert, P. (2003). Language and gender in adolescence. In J. Holmes & M. Meyerhoff (Eds.),
The Handbook of Language and Gender (pp. 381-400). Oxford: Backwell.
Escher, T. (2007). The geography of (online) social networks. Web 2.0, York University,
Retrieved September 18, 2007 from: http://people.oii.ox.ac.uk/escher/wpcontent/uploads/2007/2009/Escher_York_presentation.pdf.
Faulkner, X., & Culwin, F. (2005). When fingers do the talking: a study of text messaging.
Interacting with Computers, 17(2), 167-185.
Fono, D., & Raynes-Goldie, K. (2006). Hyperfriendship and beyond: Friendship and social
norms on Livejournal, Association of Internet Researchers (AOIR-6), Chicago. In M.
Consalvo & C. Haythornthwaite (Eds.), Internet research annual volume 4: Selected
papers from the Association of Internet Researchers conference. New York: Peter
Lang.
Golder, S. A., Wilkinson, D., & Huberman, B. A. (2007). Rhythms of social interaction:
Messaging within a massive online network, 3rd International Conference on
Communities and Technologies (CT2007), East Lansing, MI.
Grinter, R. E., & Eldridge, M. (2003). Wan2tlk? Everyday text messaging. CHI 2003, 441448.
Grinter, R. E., Palen, L., & Eldridge, M. (2006). Chatting with teenagers: Considering the
place of chat technologies in teen life. ACM Transactions on Computer-Human
Interaction, 13(4), 423-447.
Herring, S. C., Paolillo, J., Ramos-Vielba, I., Kouper, I., Wright, E., Stoerger, S., et al.
(2007). Language Networks on LiveJournal. Proceedings of the Fortieth Hawaii
International Conference on System Sciences (HICSS-40), Retrieved November 21,
2007 from: http://www.blogninja.com/hicss07.pdf.
Herring, S. C. (2001). Computer-mediated discourse. In D. Schiffrin, D. Tannen & H. E.
Hamilton (Eds.), Discourse Analysis. Maldon, MA: Blackwell.
Page 15 of 17
Herring, S. C. (2002). Computer-mediated communication on the Internet. Annual Review of
Information Science and Technology, 36, 109-168.
Herring, S. C. (2007). A faceted classification scheme for computer-mediated discourse.
Language@Internet,
4,
Retrieved
June
12,
2008
from:
http://www.languageatinternet.de/articles/2007/2761.
Hinduja, S., & Patchin, J. W. (2008). Personal information of adolescents on the Internet: A
quantitative content analysis of MySpace. Journal of Adolescence, 31(1), 125-146.
Jay, T. (2000). Why we curse. New York: John Benjamins.
Kilgarriff, A. (1997). Putting frequencies in the dictionary. International Journal of
Lexicography, 10(2), 135-155.
Kim, K.-H., & Yun, H. (2007). Cying for me, Cying for us: Relational dialectics in a Korean
social network site. Journal of Computer-Mediated Communication, 13(1), Retrieved
December19 from: http://jcmc.indiana.edu/vol13/issue11/kim.yun.html.
Kline, L. W. (1907). The psychology of humor. The American Journal of Psychology, 18(4),
421-441.
Ko, K.-K. (1996). Structural characteristics of computer-mediated language: A comparative
analysis of InterChange discourse. Electronic Journal of Communication, 6(3),
Retrieved February 27, 2008, from: http://www.cios.org/www/ejc/v2006n2396.htm.
Deerwester, S., Dumais, S., Furnas, G. W., Landauer, T. K., Harshman, R. (1990). Indexing
by Latent Semantic Analysis. Journal of the American Society for Information
Science 41(6), 391–407
Lee, C. K. M. (2007). Text-making practices beyond the classroom context: Private instant
messaging in Hong Kong. Computers and Composition, 24(3), 285-301.
Leech, G., Rayson, P., & Wilson, A. (2001). Word frequencies in written and spoken English:
Based on the British National Corpus. London: Longman.
Lerman, K. (2006). Social networks and social information filtering on Digg. ArXiv.org,
Retrieved April 23, 2007 from: http://arxiv.org/abs/cs.HC/0612046.
Malinowski, B. (1923). The problem of meaning in primitive languages. In C. K. Ogden & I.
A. Richards (Eds.), The Meaning of Meaning: Routlledge & Kegan Paul (pp. 296346).
Neuendorf, K. (2002). The content analysis guidebook. London: Sage.
Nishimura, Y. (2007). Linguistic innovations and international features in Japanese BBS
communication. In B. Danet & S. C. Herring (Eds.), The multilingual internet:
Language, culture, and communication online (pp. 163-183). Oxford: Oxford
University Press.
North, S. (2006). Making connections with new technologies. In J. Maybin & J. Swann
(Eds.), The art of English: Everyday creativity (pp. 209-230). Basingstoke,
Hampshire: Palgrave Macmillan.
Pennock, D., Flake, G. W., Lawrence, S., Glover, E. J., & Giles, C. L. (2002). Winners don't
take all: Characterizing the competition for links on the web. Proceedings of the
National Academy of Sciences, 99(8), 5207-5211.
Pew Research Center for the People & the Press. (2007). Social networking websites and
teens:
An
overview.
Retrieved
June
4,
2007,
from
http://www.pewinternet.org/PPF/r/198/report_display.asp
Prescott, L. (2007). Hitwise US consumer generated media report. Retrieved March 19, 2007
from: http://www.hitwise.com/.
Radić-Bojanić, B. (2006). Fragmentation/integration and involvement/detachment in
chatroom discourse. Skase Journal of Theoretical Linguistics, 3(1), Retrieved March
3, 2008 from: http://www.skase.sk/Volumes/JTL2005/2004.pdf.
Siebenhaar, B. (2006). Code choice and code-switching in Swiss-German Internet Relay Chat
rooms. Journal of Sociolinguistics, 10(4), 481-506.
Thelwall, M., Wouters, P., & Fry, J. (2008). Information-Centred Research for large-scale
analysis of new information sources. Journal of the American Society for Information
Science and Technology, 59(9), 1523-1527.
Page 16 of 17
Thelwall, M. (2005). Text characteristics of English language university web sites. Journal of
the American Society for Information Science and Technology, 56(6), 609-619.
Thelwall, M. (2008a). Fk yea I swear: Cursing and gender in a corpus of MySpace pages.
Corpora, 3(1), 83-107.
Thelwall, M. (2008b). Social networks, gender and friending: An analysis of MySpace
member profiles. Journal of the American Society for Information Science and
Technology, 59(8), 1321-1330.
Thurlow, C. (2003). Generation Txt? The sociolinguistics of young people's text-messaging.
Discourse Analysis Online, 1(1), Retrieved January 3, 2008 from:
http://extra.shu.ac.uk/daol/articles/v2001/n2001/a2003/thurlow2002003-paper.html.
Werry, C. (1996). Linguistic and interactional features of Internet Relay Chat. In S. C.
Herring (Ed.), Computer-mediated communication: Linguistic, social and crosscultural perspectives (pp. 47-61). Philadelphia: John Benjamins.
Yang, C. (2007). Chinese Internet language: A sociolinguistic analysis of adaptations of the
Chinese writing system. language@internet, 4, Retrieved February 27, 2008 from:
http://www.languageatinternet.de/articles/2007/1142.
Yates, S. J. (1996). Oral and written aspects of computer conferencing. In S. C. Herring (Ed.),
Computer-Mediated Communication: Linguistic, social and cross-cultural
perspectives (pp. 29-46). Amsterdam: John Benjamins.
Zipf, G. K. (1949). Human behavior and the principle of least effort: An introduction to
human ecology. Cambridge, MA: Addison-Wesley.
Appendix: Table 1 classification instructions.
Regard comments as correct if they fit Either U.S. or British English.
Typographic slang includes acronyms like: omg, lol, xx, ur, u, r, y, 2, z for s (e.g., hugz,
boyz), @, shortenings (e.g., b/c for because), x for ks (e.g., thanx); luv for love; wat for what,
numerical shortenings like: l8r, m8.
Slang includes: da, yeah, yep, ya for you, dude, man, chink, yo, like (as interjection), witcha,
witchu; shortenings (e.g., cuz, bro, sup, wit, in(ing)), swearing (fuck, ass, god), sayings like
pour it up, whatever, what's good; this includes all written attempts to echo dialect or nonstandard pronunciation.
Spelling: Do not count slang or typographic slang as wrong spelling, but do count multipleletter slang (e.g., hellllooo) as wrong spelling. Assume that all proper nouns are correct but
otherwise assume non-slang terms not in a dictionary are spelling mistakes.
Punctuation: Commas, apostrophes (in possessives, but also in words like don't), full-stops,
colons, semi-colons used where the text seems to need them for standard English. (but see the
grammar vs. punctuation section). Count the use of multiple consecutive punctuation marks
as an error (e.g., !!!, ?! unless ellipsis (exactly three full stops)). Do not count a missing fullstop at the end of a comment as a punctuation or grammar error if it follows a closing
statement, e.g., "see ya" or "Later, kate xx" or "ttfn". A space should follow punctuation
except for apostrophes, quotes, and sentence endings.
Pictograms: Any text pictures, e.g., :-) ^..^ also include ♥ as a picture.
Interjections: e.g., huh, haha, mwuahh, but not "oh".
Capitals correct: The use of capital letters for proper nouns, I, and sentence beginnings is
correct - or title case is used if the comment seems to be a title or caption - or the capitals are
appropriate for a letter format (e.g., Dear Jane, How are you?). Capitals are optional following
a colon.
Grammar: Ignore all of the above errors for this section, and check if the sentence deviates
from standard formal English in any other way. For example, check for subject-verb
agreement, incomplete sentences, missing verbs pronouns or nouns, sentences that don't make
sense, sentences or phrases run together without joining words (e.g., and, but).
Grammar vs. punctuation errors: In cases where the grammar is wrong because
punctuation is missing, record a punctuation error if the author is deliberately avoiding all
punctuation, but record a grammar error if the author is using some punctuation but has
Page 17 of 17
missed out some. E.g., the comment "how are you what are you doing today" would count as
a punctuation error (no punctuation at all) but the comment "how are you what are you doing
today?" would count as a grammar error rather than a punctuation error because the author
should have used something like "?", ".", or "and" between the two phrases".
Download