Text Segmentation in the Informedia Project Faculty Mentor: Students: Alex Hauptmann Jichuan Chang Ningning Hu Zhirong Wang (alex@cs.cmu.edu) (cjc@cs.cmu.edu) (hnn@cs.cmu.edu) (zhirong@cs.cmu.edu) Abstract In this paper, we report our experiences of building text segmenter for the Informedia project. Several methods are used in our project; their experimental results are compared and analyzed. Based on the application background of Informedia project, we choose to use relaxed error metrics to do performance evaluation. 1. Introduction Segmentation is an integral and critical process in the Informedia digital video library. The success of information retrieval in Informedia hinges on the critical assumption that we can segment the whole news broadcast into individual paragraphs or stories. The segmentation task can be conducted on different media (video, speech, text, etc.), and their result can be integrated to achieve better performance. This paper will report the experiences of building close-captioning text segmenter to aid the segmentation of audio or video data. Text segmentation problem focus on how to identify the story boundary, where one region of text ends and another begins, within a document. This work was motivated by the observations that such a seemingly simple problem can actually prove quite difficult to automate [3], and that a tool for partitioning a stream of undifferentiated text into coherent regions would be of great benefit to a number of existing applications. Consider the following scenario: a video-on-demand application can respond to a news event query, by providing the user with a stream of video containing related news clips. This application may be able to accurately locate positions in its database which are highly relevant to the query, but unable to determine how much of the neighboring data should be provided to the user. Apparently, without an accurate segmentation tool, the user will be flooded with overly abundant or unrelated (commercials, for example) information! Text segmenter also helps to detect subtopics in a long passage, allowing the reader to quickly jump to his/her most interested topics. Because segmentation information provides additional structural information about the document, such tools can also be used in information extraction and summarization tasks, to quickly build the outlines of key points in a long passage. We treat the text segmentation problem as a task of automatically locating topic boundaries. It can be refined as a classification problem: given a block of continuous words (or sentences), a segmenter should tell us if there exists a boundary in this block by observing a set of labeled data. Different classification methods are used in our project to 1 compare their performance, and find out a better method. These methods include Neural Network (BP network), Naive Bayes Classification, and Support Vector Machine. Sentences within the correctly segmented text regions are (parts of) semantically coherent, belonging to the same topic. The story boundaries in news transcript are usually related to topic shift in between different part of the document. This observation suggests the method of text segmentation by detecting topic changes. According to this observation, we also use topic change detection to assist the segmenter. The underlying topics of news stories are identified using Expectation Maximization clustering method. 2. Some previous work In this section we very briefly discuss some previous approaches to the text segmentation problem. 2.1 Exponential Models In Beeferman’s paper “Text Segmentation Using Exponential Models”[3], introduces a new statistical approach to partitioning text automatically into coherent segments. The approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large corpus of annotated data. 2.2 Machine learning In Litman’s paper “Combining Multiple Knowledge Sources for Discourse Segmentation “[9], predicts discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. In this paper, they present two methods for developing segmentation algorithms from training data: hand tuning and machine learning. When multiple types of features are used, results approach human performance on an independent test set (both methods), and using cross-validation (machine learning). 2.3 Lexical cohesion In Kozima’s paper “Text Segmentation Based on Similarity Between Words “[10], proposes a new indicator of text structure, called the lexical cohesion profile (LCP), which locates segment boundaries in a text. A text segment is a coherent scene; the words in a segment are linked together via lexical cohesion relations. LCP records mutual similarity of words in a sequence of text. The similarity of words, which represents their cohesiveness, is computed using a semantic network. Comparison with the text segments marked by a number of subjects shows that LCP closely correlates with the human judgments. LCP may provide valuable information for resolving anaphora and ellipsis. Kozima generalizes lexical cohesiveness to apply to a window of text, and plots the cohesiveness of successive text windows in a document, identifying the valleys in the measure as segment boundaries. 2.4 Text tiling. In Hearst’s paper “Multi-paragraph Segmentation of Expository Text “[8], describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph 2 discourse units which reflect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully implemented versions of the algorithm are described and shown to produce segmentation that corresponds well to human judgments of the major subtopic boundaries of thirteen lengthy texts. A cosine measure is used to gauge the similarity between constant-size blocks of morphologically analyzed tokens. First-order rates of change of this measure are then calculated to decide the placement of boundaries between blocks, which are then adjusted to coincide with the paragraph segmentation, provided as input to the algorithm. This approach leverages the observation that text segments are dense with repeated content words. Relying on this fact, however, may limit precision because the repetition of concepts within a document is subtler than can be recognized by only a “bag of words" tokenizer and morphological filter. 2.5 Dragon’s Approach for text segmentation In Allan’s paper “Topic Detection and Tracking Pilot Study Final Report “[11], describes several approaches to do text segmentation. One is from Dragon. Dragon's approach to segmentation is to treat a story as an instance of some underlying topic, and to model an unbroken text stream as an unlabeled sequence of these topics. In this model, finding story boundaries is equivalent to finding topic transitions. At a certain level of abstraction, identifying topics in a text stream is similar to recognizing speech in an acoustic stream. Each topic block in a text stream is analogous to a phoneme in speech recognition, and each word or sentence (depending on the granularity of the segmentation) is analogous to an ``acoustic frame''. Identifying the sequence of topics in an unbroken transcript therefore corresponds to recognizing phonemes in a continuous speech stream. Just as in speech recognition, this situation is subject to analysis using classic Hidden Markov Model (HMM) techniques, in which the hidden states are topics and the observations are words or sentences. 3. Our Approach We approach the text segmentation problem by several methods. As there are many tools in public domain which have already implemented most of the methods, our work mainly fall in the preparation of data, selecting features, parameter tuning, and result comparison and analysis. 3.1 Data Collection and Preparation The close-captioning raw data come from Informedia project, including several thousands of transcripts under different classes: CNN World View, The World Today, Early Prime and Science & Technology. We choose to use the CNN World View transcripts from year 1999 to October 2000 (about 500 passages). The data come in proprietary format (see example below), with timing information besides (“>>>” indicates a boundary). 001630 CENTURY >>> WE PEOPLE TEND TO 001631 PUT THINGS LIKE THE PASSING OF A 001633 MILLENIUM IN SHARP FOCUS. WE 3 001633 001635 001636 001638 001641 001641 001642 001643 001654 001654 CELEBRATE, CONTEMPLATE, EVEN WORRY A BIT, SOMETIMES WORRY A LOT. AFTER ALL, IT'S SOMETHING THAT HAPPENS ONLY ONCE EVERY ONE THOUSAND YEARS. A BIG DEAL? PERHAPS NOT TO ALL LIVING THINGS, AS CNN'S RICHARD BLYSTONE FOUND OUT WHEN HE CONSIDERED ONE VERY OLD TREE. >>> HO HUM. ANOTHER MILLENNIUM. THE GREAT YEW The raw data are pre-processed in several logical steps (by one or more programs): (1) Capitalize all words; remove non-printable characters and timing information. (2) Remove stories being too short (less than 50 words) or too long (more than 1500 words), these two limits are based on the actual distribution of stories length. Below is the pre-processed example in one sentence per line format: WE PEOPLE TEND TO PUT THINGS LIKE THE PASSING (omitted for short). WE CELEBRATE, CONTEMPLATE, EVEN WORRY A BIT, SOMETIMES WORRY A LOT. AFTER ALL, IT'S SOMETHING THAT HAPPENS ONLY (omitted for short). A BIG DEAL PERHAPS NOT TO ALL LIVING THINGS, (omitted for short). >>> HO HUM. ANOTHER MILLENNIUM. (3) Stemming using a standard algorithm (Porter's stemming algorithm). (4) Merge numbers, titles, date and time, abbreviations into their class names. For example, “3:50 PM” and “10:00 AM” are both represented as __TIME__. (5) For different methods we used, the intermediate data are divided into fixed length (in words or sentences) text blocks. (6) Some tools we used will exclude the stop words (common words like “the” and “where” which rarely help the classification) from input data. So we will remove them only when needed. (7) For some tools (ANN and SVM), we need to transform the block of text into vectors. Given the selected feature space, a vector is actually composed of component values (are number of occurrence of distinct words). 3.2 Support Vector Machine Support vector machines are based on the Structural Risk Minimization principle stemming from computational learning theory. The nice property of SVM is: it is classification performance is independent of the dimensionality of the feature space, which is particularly useful for our text segmentation problem (usually involves tens of thousands of features -- words). It is also reported that SVM has achieved substantial improvement over current text classification methods. In our experiment, we divide the passage into many blocks with exactly 2 consecutive sentences. For each block, if there is a boundary between the sentences, it is labeled as “yes” and called “boundary block”; otherwise the block is labeled as “no” and the sentences are called “background block”. We have tried two SVM classification tools: 4 Rainbow: Rainbow is a statistical text classification tool built on top of the Bow library. It first build an index file by counting word in training data, then a SVM classifier are trained and used to classify the testing data. In our experiment, the performance is similar with Naive Bayes classification (although achieved in much longer time). SVMlight is a SVM tool built for text classification. It accepts vector (of word counts) or sparse vector input. After counting the number of distinct word, we realized that even for SVM, there are too many features to train a classifier. In order to reduce the dimension of feature space within several hundreds, we decide to choose only the words with the highest average mutual information. To simplify the computation, we actually chose words from the sentences sitting just before and after the boundary, which have the largest difference between their occurrence probability in boundary blocks and background blocks. The result is disappointing mainly because SVMlight takes too long to learn. For a simple case with 500 training data (actually 3 passages), the training process finished in 15 minutes (out machine runs Linux on top of a Intel Pentium III box). The training time increases quickly with the training data size, when we change the size of training data into 1600 (about 8 passages) if failed to finish after 15 hours1. 3.3 Neural Network We use the stochastic gradient descent version of Back Propagation algorithm for feedforward networks containing 2 layers of sigmoid units. The network structure is illustrated in Figure 1. Units in each layer are connected with all units from the proceeding layer. The output is a vector of 2 components, they correspond to the probabilities in predicting the input data is a boundary block or not. Below we will discuss other two network parameters: 2 output units ... ... ... 10 hidden units ... 100 input units Figure 1. Structure of 2-layer Back Propagation Network Input Units and Hidden Units The number and features of input units are determined by experiments. We choose to use n-vector (n=100 or 200) by counting the occurrence of top n words with the highest mutual information (just the same as used in SVMlight). It’s supposed that more input units can improve the performance. In our experiments, Neural Networks with 200 input units outperform those with 100 units by 5%, but the computation increase much quicker. 1 Actually I never finished the training process in 2 weeks and finally gave it up. 5 The number of hidden units is also determined by experiments and tradeoff between accuracy and computation cost. We finally choose to use 100 input units and 10 hidden units (see Figure 1). Merging False Alarms By observing the classification result of ANN, we noticed an interesting phenomenon: about 15% false alarms are “clustered” around some true boundary. For example, below are results of classification of a short passage. Boundary blocks are represented as 0’s and background blocks 1’s. The three consecutive 0s in classification line show one of such “false alarm cluster”. Because there are usually more than two sentences contributing to a story’s introduction and conclusion (sign-offs), features in such sentences (that suggesting the existence of a nearby boundary) are also learned by our Neural Network. Such features cause some confusion when or segmenter try to distinguish between a boundary block and a background block. Reference Classification: 111111011111011110111111 Classification (Before Merging): 110111000101011110111111 Classification (After Merging): 110111011101011110111111 One method to reduce such confusion is to include temporal information, which may help the segmenter to distinguish the first boundary block in the introduction part and the following background blocks. But we chose to use a much simple, brute-fore method to solve this problem – merging such false alarms. Assume that the input data are arranged according to their sequence of occurrence, we simple transverse the false alarm cluster, select one block with the highest target value (the one our segmenter feels the most like to contain a boundary), and change other 0’s into 1’s. This method is simple and effective, except it might also remove some true boundary but leave a false alarm out there, which will slightly reduce the recall value. We can’t recover such errors, but the relaxed error metrics will not count such errors. Stop words We also studied the impact of removing stop words in our experiments. Removing stop words is one of the common practices in text processing applications. Why bother to observe the effect of classification with stop words? Because our mutual information statistics result showed that stop words occupy more than 2/3 of the 50 words, different importance in the two groups. We did experiments using input data with and without removing stop words, trying to find out if the stop words really matter in our classification. The result shows that segmentation with stop words increases the recall value, but decrease the precision. This suggests that there probably exist some special pattern of stop words in boundary blocks, which helps to identify more true boundaries. But such pattern also occurs in background blocks and can introduce more false alarms. 3.4 Naive Bayes Classification Naive Bayes classification is a powerful method widely used in text classification applications. Rainbow toolkit is utilized in our experiments. One of the major problems in Naive Bayes classification (also in other methods) is the selection of training data. After cutting raw data into blocks (size = 2 sentences), there are only 7% boundary 6 blocks. After finished several initial experiments, we realized that such low frequency of boundary blocks can’t be used to effectively train our classifier (it can only identify 10% true boundaries). Actually, increasing the percentage of boundary blocks in training data can effectively improve the recall value, but also hurts the precision of segmentation. We did some experiments to choose the suitable percentage of boundary blocks in our training data. A good tradeoff between precision and recall relies on the application context. In our project, we assume that a lower precision will only provide the user with shorter news clips, which is better than flooding the user with unrelated information, which is a result of lower recall. Such tradeoff leads to a relative preference of recall value in our experiments. Also, different classification methods prefer different percentage of boundary blocks in the training data. For Naive Bayes and Neural Network classification, we use 50% boundary blocks in the training data. 3.5 Topic Change Detection Topic change detection was used in Dragon's approach of text segmentation, and proved to be quite effective (67% recall and 65% precision). Dragon uses multi-pass k-means algorithm to construct the clusters, while we choose to use Rainbow’s EM clustering to attach this problem. There are two important parameters to be determined in our method: Number of topics When clustering documents, one must provide the number of clusters in our data set. Dragon borrowed the trick from Speech Recognition field, using thresholding to limit the size of search space (number of clusters) and iteratively to merge topics and create new ones. In our project, due to the limitation of Rainbow tool, we have to choose the number by intuition and experiments. Size of sliding window We have tried different window size in the topic change detection method, and 8 sentences works the best. As the size of text window grows, more boundary blocks will be combined into one text window, thus decreases the number of identified boundary blocks. We stop at the point of 8 sentences in one window, because after that the portion of such error begins to be not negligible. Table 1 shows the result of different segmenter built with different sliding window size and number of topics. According to this result, we can see that clustering into more topics can improve the overall classification accuracy. But limited by Rainbow’s processing ability, we only have the time to get such result. Recall Precision Size 4 6 8 4 6 Topics 8 0.321839 0.256177 0.311724 0.196568 0.248424 16 0.421456 0.360208 0.38069 0.198198 0.267633 0.353846 Table 1. Segmentation performance using topic change detection method 7 3.6 Fixed Length Text Segmentation Using Naïve Bayes Classifier We cut our news stories into small passages with a fixed window. These passages were manually classified into two classes. If the passage includes the boundary sentence, it will be labeled as “yes” class. “Yes” means there is a boundary in this passage. Else this passage will be classified as “no” class. We divided the dataset into two parts, one set for training and the other for testing. The testing data includes CNN World View 2000 from July to October. Now our objective is to use these two categories of data to build a classifier. Since Naive Bayes classifiers are among the most successful known algorithms for learning to classify text documents. So we applied Naïve Bayes classifiers to our problem. 4. Experimental Result and Analysis 4.1 Error Metrics After we got the experiment results, how can we evaluate the performance of different segmentation methods? Two useful indicators are precision and recall, the conventional information retrieval metrics. For our segmentation task, Recall = # actual boundaries identified / # total boundaries Precision = # actual boundaries identified / # boundaries identified Researchers have also proposed other novel measurement for text segmentation problem: For fixed length segment, [Dragon] uses the fraction of overlap part between the segment and relevant story as the metrics of relevance; For text segmentation based on language model, [3] proposes a new error metric based on the possible distance (in number of words) between identified boundary and the neighboring actual boundaries. A similar but much simplified idea is used in our approach. We use a sentence as the minimum unit of segmentation. In the simplest case: a boundary is correct if and only if it is a true boundary. But considering our application of interactive query, one segmentation method is almost satisfactory if it always comes close to the true boundary. The closeness can be defined in units of words or sentences. Here we would relax our correctness criteria to accept all boundaries that are one or two sentences off a true boundary. We call the distance between identified boundary and the closest true boundary DR (degree of relaxation). Figure 2 illustrate the relaxed failure model for our sentence-based segmentation methods. Identified boundary Sentences Reference boundary OK (YY0) Miss (YN) False Alarm (NY) OK (NN) OK (YY1) OK (YY0) Figure 2. Failure model of sentence-based text segmentation method (Adapted from [3]) (YY# means under the degree of relaxation #, identified boundary is OK.) Below is our result with the relaxed error metric for ANN method (10 hidden units), relaxed error metrics helps to reduce the error introduced by false alarm merging. 8 Before merging DR Precision Recall 0 0.241 1 2 0.554 After merging Precision Recall 0.263 0.516 0.331 0.648 0.290 0.666 0.336 0.772 0.383 0.749 Table 2. Performance of ANN segmentation 4.2 Performance Evaluation (1) SVM: Rainbow and SVMlight SVMligh Rainbow 0.07 ?? Recall 0.223 ?? Precision Table 3. Segmentation Result of Rainbow and SVMligh (2) ANN 2.1. Impact of Training Data Distribution According to the following data, we choose to use 50% boundary blocks in our training data. Because this distribution provides rather high recall value (71% after merging) and acceptable precision value (33% after merging). This actually means the average length of our segmentation is 5 sentences, corresponding to 30 seconds news broadcasting. Impact of Training Data Accuracy 1 0.8 Recall (no merge) 0.6 Recall (merged) 0.4 Precision (no merge) 0.2 Precision (merged) 0 %Y = 25% %Y = 33% %Y = 50% %Y = 67% Distribution Figure 3. Impact of Training Data Distribution 2.2. Impact of stop word removal Stop words removal helps to improve the recall value, but hurts the precision. Table 3 gives the result of ANN with 100 input units, using 50% boundary blocks in training data. The same trend can be observed using different training data. %Y = 50% No Stop Words With Stop Words No Stop Words With Stop Words 100 Input Units (No merge) (No merge) (Merged) (Merged) 0.597 0.721 0.705 0.846 Recall 0.201 0.115 0.327 0.208 Precision 9 Table 3. Impact of stop words removal 2.3. Impact of number of features (number of input units) %Y = %50 100 Input Units 100 Input Units 200 Input Units 200 Input Units (No merge) (No merge) (Merged) (Merged) 0.597 0.628 0.705 0.739 Recall 0.201 0.137 0.327 0.219 Precision Table 4. Impact of number of features The effect of increasing input units is the same as stop words removal. The same trend can be observed using different training data. The second reason that we chose to use 100 input units is it greatly reduced the computation cost. (3) Naive Bayes classification %Y = 25% %Y = 33% %Y = 50% Recall 0.589 0.777 0.888 Precision 0.122 0.100 0.009 Table 5. Impact of training data using Naive Bayes Segmenter (4) Topic change detection method 4 6 8 4 6 8 Recall Precision # topics = 8 0.322 0.256 0.312 # topics = 8 0.197 0.248 0.366 # topics =16 0.421 0.360 0.381 # topics =16 0.198 0.268 0.354 Table 6. Impact of window size and topics number The result of topic change detection method is very different from other methods, with a much lower recall but relatively high precision value. We can say that TCD method is a conservative segmenter, which will not be tempted to identify too many boundaries, because it uses global information only. Such global information can be combined with ANN method, which uses only information within 2 consecutive sentences. Because the window size in TCD method is different from ANN and Naive Bayes methods, we didn’t test the result of voting with different methods. But we believe that future work can be done in this direction to improve the segmentation accuracy by integrating their power. (5) Fixed length segmentation Here is the result of fixed length segmentation using Naïve Bayes Classifier. Correct: 55313 out of 65359 (84.63 percent accuracy) Class name No Yes Total Acc(%) 0 No 48330 7177 56507 87.3 1 Yes 2869 5983 8852 67.59 - Confusion details, row is actual, column is predicted Percent_Accuracy average 84.63 Table 7. Fixed length segmentation using fixed window size (30 words)[3] Table 8 is the Recall/Precision of this method Recall Precision 30 words (Fixed window size) 0.68 0.45 10 Table 8. Fixed length segmentation using fixed window size (30 words)[2] (6) Performance of different segmentation methods 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Precision Recall SVM ANN NB TCD FL Figure 4. Segmentation Accuracy The chart above shows the performance values of different methods we have tried. The best tradeoff point between precision and recall are select and compared. All of our methods suffer from rather low precision values, but higher recall values (although topic change detection also has relatively lower recall value). We currently chose to use the Fixed Length (FL) segmenter because it reaches the best tradeoff point among all these methods. Our results are different from published segmentation result, partly due to the difference of data set. The close-captioned transcripts we uses are much noisy (with noisy words, omitted sentences and incorrect labels), which is proved to be more difficult to work with. 5. Conclusion Compared with published result of methods, we exploited some of the simple and traditional machine learning methods to the problem of text segmentation. We have achieved little higher recall but rather lower precision performance. This leaves a lot of space of improvement to our methods, for example, integrating time series analysis with our ANN classification, using more topics to cluster news stories, etc. We can also combine current method with more sophisticated methods (such as Dragon’s approach or Hearst Algorithm), or even segmentation information coming from other media (such as video segmentation and speech recognition). 11 References [1] Y. Yang, T. Ault T. Pierce and C. Lattimer, “Improving text categorization methods for event tracking”, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp65-72. 2000. [2] T. Mitchell, “Machine Learning” [3] D. Beeferman, A. Berger, and J. Lafferty, "Text Segmentation Using Exponential Models," Proceedings of Empirical Methods in Natural Language Processing, AAAI 97, Providence, RI, 1997 [4] M. A. Hearst, and C. Plaunt, “Subtopic structuring for full-length document access,” in ProcACM SIGIR-93 Int’l Conf. On Research and Development in Information Retrieval, pp. 59 – 68, Pittsburgh, PA, 1993. [5] A. Merlino, D. Morey, and M. Maybury, “Broadcast News Navigation using Story Segmentation,” ACM Multimedia 1997, November 1997 [6] A. Hauptmann, M. Witbrock, "Story Segmentation and Detection of Commercials in Broadcast News Video," ADL-98 Advances in Digital Libraries Conference, Santa Barbara, CA., April 22-24, 1998 [7] J. Lafferty, A. Berger, D. Beeferman, “Statistical Models for Text Segmentation,” Special Issue on Natural Language Learning, C. Cardie and R. Mooney, eds. 34(1-3), pp 177-210, 1999 [8] M. A. Hearst, “Multi-paragraph segmentation of expository text”. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994 [9] J. Litman and R.J. Passonneau, “Combining Multiple Knowledge Sources for Discourse Segmentation”, in Proceedings of the ACL, 1995. [10] H. Kozima, “Text Segmentation Based on Similarity between Words”, in Proceedings of the ACL, 1993 [11] Allan, J., Carbonell, J.G., Doddington, G., Yamron, J. and Yang Y.” Topic Detection and Tracking Pilot Study Final Report”, Proceedings of the Broadcast News Transcription and Understranding Workshop (Sponsored by DARPA), Feb. 1998 Appendix A: Top 50 Features used in ANN experiments We count the word appearing the first and last sentences of stories respectively, so the 50 words actually come from two groups: 25 words from the ending sentences and 25 from the beginning sentences. Without Stop Word Removal CNN THE OF IT TO REPORTS AND THAT IS THIS CAPTIONING ON OF A I THAT AND IN IT WORLDVIEW THEY WE __USA__ ON After Stop Word Removal CNN REPORTS CAPTIONING __NUM__ CLOSED WORLDVIEW ADDITION REPORTING BELL LONDON PROVIDED ORDERED 12 WORLDVIEW __USA__ UNITED CLINTON THINK PRESIDENT STATES __NUM__ CNN SPACE WORLD IRAQ A __NUM__ THEY WE ARE CLOSED BY THERE WORLDVIEW AT I DO ADDITION YOU HE BE THERE DO TO HAVE FOR UNITED THIS WILL CLINTON FROM THINK HEART JERUSALEM ATLANTIC WASHINGTON WALTER RODGERS PEOPLE COMMUNICATION WHITE THANK CORRESPONDENT MOSCOW 13 PRINCESS DAY DIANA WELL ISRAEL NEW JUDY AHEAD GET TODAY FIRST MILITARY MONEY