lab1.3

advertisement
Java Review Programming Project 3
Lab Name:
Flesch Readability Index.
Purpose:
To work with complex algorithms.
Required Files:
Introduction: In this project you will implement a class that determines the Flesch Readability
Index for a piece of text. This method of calculating the readability of a piece of text was devised
by Rudolf Flesch, author of Why Johnny Can't Read and The Art of Readable Writing. When you
check the spelling and grammar in a Word document you can have the readability statistics
displayed, including the Flesch index.
The Flesch Readability Index is a number, generally between 0 and 100, that indicates how easy a
piece of text should be to read. The lower the number, the harder the text is to read. A general
breakdown of reading levels based on Flesch Index is:
Flesch Score
90 to 100
80 to 90
70 to 80
60 to 70
50 to 60
30 to 50
0 to 30
Approximate grade level
5th grade
6th grade
7th grade
8th to 9th grade
10 to 12th grade (high school)
13th to 16th grade (college level)
college graduate.
The index is calculated by a fixed set of rules for counting the number of sentences, words, and
syllables in a piece of text. This can be automated via a computer program. Here is an example.
Consider the following sentence:
It was an extraordinarily windy day, and thus the riders were faced with several arduous climbs
up the mountain, with the wind trying to push them back down the road.
Java Review Programming Project 3
1
Java Review Programming Project 3
The Readability Index for that sentence is 58. The following conveys almost the same idea,
It was a very windy day. The riders had many hard climbs up mountains. The wind kept pushing
them back down the road.
but has a Readability Index of 92. This method does not do a linguistic analysis so the results can
be misleading, but the method usually produces a good answer.
Problem Description: Complete the following method in class Flesch:
/*
pre: text != null
post: return an integer array with 4 elements. The elements
will represent the following information. [0] = Flesch
reading score. [1] = Number of sentences. [2] = Number of
words. [3] = number of syllables. If number of words equals
0 or the number of sentences equals zero the Flesch score
is set to 1000.
*/
public int[] getReadabilityStats(String text)
1. The readability index itself is calculated by the following formula:
Index = 206.835 - ( 84.6 * ASW ) - ( 1.015 * AWS )
rounded to the nearest integer.
ASW is Average Syllables per Word = total number of syllables / total number of words
AWS is Average Words per Sentence = total number of words / total number of
sentences
2. The program needs to count the number of words, number of syllables, and number of
sentences. Certain assumptions are made about what is a word, syllable, and sentence in
order to make it easier to write a program to do this analysis.
3. A word is sequence of one or more characters delimited by white space or by a sentence
terminators as listed in rule 5, whether or not it is an actual English word. White space is
defined as a space, tab ( '\t'), a new line character ('\n'), and the end of the String itself.
4. To count the total number of syllables use the following rules. Note, these rules are a
heuristic, which is defined as " A rule of thumb or guideline (as opposed to an invariant
procedure). Heuristics may not always achieve the desired outcome, but they are
extremely valuable to problem-solving processes." Heuristics are valuable because they
simplify the problem solving process. The following rules will sometimes give you the
wrong answer for the number of syllables in a word, but they usually give the right
answer and are much easier to implement then storing ALL the words that might be
encountered and their syllable count.
Java Review Programming Project 3
2
Java Review Programming Project 3
a. Each group of adjacent vowels counts as one syllable. Vowels consist of upper
and lower case a, e, i, o, u and y. For example, the "ea" in "real contributes one
syllable, but the "e" and the "a" in "regal" count as two syllables. "Happy" has
two syllables, because of the 'a' and 'y'.
b. Each word has at least one syllable, even if rule a gives it a count of 0. Thus the
String "Shhhhhh Shhhhhhh" has 2 words and each word has 1 syllable.
5. Count all the sentences. Each occurrence of a period, colon, semicolon, question mark,
and exclamation mark count as a sentence. Thus the String "Gack!!!" has 1 word with 1
syllable, but 3 sentences. (Again this set of rules is a heuristic. A set of rules that often
gives a good answer, but occasionally gives bad or nonsensical answers. It is possible per
these rules to have a sentence with no words.)
Examples:
Test sentence 1: "This is a sentence. So is this!"
Number of sentences: 2
Number of words: 7
Number of syllables: 9
Flesch readability index: 95
Test sentence 2: "The following index was invented by Flesch as a simple tool to estimate the
legibility of a document without linguistic analysis."
Number of sentences: 1
Number of words: 21
Number of syllables: 42
Flesch readability index: 16
Test sentence 3: "Wette. It 'reven hem, or was revenrage. With hey kince kin himply to justron'
wer\", \"stere what willi?"
Number of sentences: 3
Number of words: 18
Number of syllables: 28
Flesch readability index: 69
This example is merely to show the algorithm works regardless of if the input is standard English
or not. You could even run the algorithm on source code, although the answer would not be very
helpful or meaningful.
Limitations: Do not use the StringTokenizer or split method from the string class.
Java Review Programming Project 3
3
Java Review Programming Project 3
Hints:
1. Do not try to do this with one big method. Create helper methods, method that do some
small part of the task.
2. Do not start coding first. You must design your algorithm first. Think carefully about
how you would do the problem with a pencil and paper.
a. One algorithm is to step through the text one character at a time. You need to
carefully consider all the possible cases and what actions must occur when the
case changes. Possible cases include being in a word, being in a vowel cluster,
and being at the end of a sentence. This is not a complete list.
b. Another algorithm would be to take the original String and parse (break up into
its component parts) it into sentences and count them. Then parse each sentence
into words and count them. And finally parse each word into syllables. This
seems like an attractive and intuitive approach but turns out to be fairly
complicated.
3. Each sentence terminator counts as a single sentence. So the text "RATS!!!!" Would have
4 sentences. This is a simplification that makes the program a easier to write. Do not try
to cover special cases. Follow the rules as stated even though there are cases that they
seem to fail on. Remember the Flesch Index is a simple and heuristic to calculate
readability
Extras: Notice that more often than not an e at the end of the word should not count as a vowel.
Word
role
estimate
sentence
Actual Number of Syllables
1
3
2
Number of Syllables With Given Rules
2
4
3
One way to deal with this is to not treat any e at the end of a word as a vowel. This complicates
the problem. Implement this new set of for determining the number of syllables in a word:
a. Each group of adjacent vowels counts as one syllable. Vowels consist of upper and lower
case a, e, i, o, u and y at the end of a word. For example, the "ea" in "real contributes one
syllable, but the "e" and the "a" in "regal" count as two syllables. "Happy" has two
syllables, because of the 'a' and 'y' at the end.
b. A lone "e" at the end of a word that is not part of a larger vowel cluster is an exception to
the previous rule. So "role" has a single syllable according to these rules.
c. Each word has at least one syllable, even if rules a and b give it a count of 0. Thus the
String "Shhhhhh Shhhhhhh" has 2 words and each word has 1 syllable.
How does this set of rules affect the syllable count? Does it make the syllable count more or less
accurate?
Java Review Programming Project 3
4
Download