Session 05 Java Strings and Files Exercise Complete the “quick-and-dirty” class CharacterCounter containing only a main() method that displays the number of non-space characters on the command line after the command. For example: $ java CharacterCounter 0 $ java CharacterCounter a 1 $ java CharacterCounter a bc def ghij 10 CharacterCount template public class CharacterCounter { public static void main( String[] args ) { int characterCount = 0 ; } // end main } // end class CharacterCounter StringTokenizer • Useful tool for processing a String object • Allows you to sequentially walk down a String and extract “words”/tokens that are delimited by specified characters • What delimiter normally aids us in parsing a long string into words? StringTokenizer General usage of a StringTokenizer: – create one using a constructor that takes a string argument to process – send one of two messages: hasMoreTokens() and nextToken – use a stereotypical loop to process a sequence of strings A default StringTokenizer uses spaces as delimiters. StringTokenizer Example import java.util.StringTokenizer; public class EchoWordsInArgumentV1 { public static void main( String[] args ) { StringTokenizer words = new StringTokenizer(args[0]); while( words.hasMoreElements() ) { String word = words.nextToken(); System.out.println( word ); } // end while } // end main } // end class EchoWordsInArgumentV1 StringTokenizer Example $ java EchoWordsInArgumentV1 "StringTokenizer, please process me." StringTokenizer, please process me. • Notice the quotes (“”) in the command line so the whole string is read as args[0]. • The comma (“,”) and period (“.”)are part of the words and not delimiters by default. StringTokenizer Example 2 • Fortunately, we can construct a StringTokenizer that uses specified characters for delimiters. • The designer of the StringTokenizer was planning ahead for future usage!!! $ java EchoWordsInArgumentV2 "StringTokenizer, please process me." StringTokenizer please process me StringTokenizer Example 2 import java.util.StringTokenizer; public class EchoWordsInArgumentV2 { public static void main( String[] args ) { String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r"; StringTokenizer words = new StringTokenizer( args[0], delimiters ); while( words.hasMoreElements() ) { String word = words.nextToken(); System.out.println( word ); } // end while } // end main } // end class EchoWordsInArgumentV2 UNIX/Linux pipe • “|” character on the command line • Allows the output of one program to be sent as input to another program, like the UNIX “sort” utility. $ java EchoWordsInArgumentV2 "StringTokenizer, please process me.” | sort StringTokenizer me please process • Is this sorted? How can we fix this? StringTokenizer Example 3 import java.util.StringTokenizer; public class EchoWordsInArgumentV3 { public static void main( String[] args ) { String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r"; StringTokenizer words = new StringTokenizer( args[0], delimiters ); while( words.hasMoreElements() ) { String word = words.nextToken(); word = word.toLowerCase(); System.out.println( word ); } // end while } // end main } // end class EchoWordsInArgumentV3 StringTokenizer Example 3 $ java EchoWordsInArgumentV3 "StringTokenizer, please process me." | sort me please process stringtokenizer Java File I/O • Allows us to write and read “permanent” information to and from disk • How would file I/O help improve the capabilities of the MemoPadApp? Java File I/O Example: Echo.java • echoes all the words in one file to an output file, one per line. $ java Echo hamlet.txt hamlet.out $ less hamlet.out 1604 the tragedy of hamlet prince of denmark by william shakespeare ... Study Echo.java’s File I/O • have constructors that allow convenient and flexible processing • send input message: readLine() • send output messages: print() and println() • use a stereotypical loop to process a file of lines • use of the stereotypical StringTokenizer loop as inner loop import java.io.*; import java.util.StringTokenizer; public class Echo { public static void main( String[] args ) throws IOException { String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r"; BufferedReader inputFile = new BufferedReader(new FileReader(args[0]) ); PrintWriter outputFile = new PrintWriter( new FileWriter( args[1] ) ); String buffer = null; while( true ) { buffer = inputFile.readLine(); if ( buffer == null ) break; buffer = buffer.toLowerCase(); StringTokenizer tokens = new StringTokenizer( buffer, delimiters ); while( tokens.hasMoreElements() ) { String word = tokens.nextToken(); outputFile.println( word ); } // end while } // end while(true)... } // end main } // end class Echo wc - UNIX/Linux utility • wc prints the number of lines, words, and characters in a file to standard output. • For example: $ wc hamlet.txt 4792 31957 196505 hamlet.txt Exercise • Using Echo.java as your starting point, create a WordCount.java program that does the same thing as wc, i.e., prints the number of lines, words, and characters in a file to standard output. For example: $ java WordCount hamlet.txt 4792 32889 130156 import java.io.*; import java.util.StringTokenizer; public class WordCount { public static void main( String[] args ) throws IOException { String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r"; BufferedReader inputFile = new BufferedReader( new FileReader( args[0] ) ); String int int int buffer chars words lines = = = = null; 0; 0; 0; while( true ) { buffer = inputFile.readLine(); if ( buffer == null ) break; lines++; buffer = buffer.toLowerCase(); StringTokenizer tokens = new StringTokenizer( buffer, delimiters ); while( tokens.hasMoreElements() ) { String word = tokens.nextToken(); words++; chars += word.length(); } // end while } // end while( true )... System.out.println( "" + lines + " " + words + " " + chars ); } // end main } // end class WordCount Why the difference in the number of words and number of characters? $ wc hamlet.txt 4792 31957 196505 hamlet.txt $ java WordCount hamlet.txt 4792 32889 130156