Like other disciplines of science, the finding of new information and modification of existing knowledge advance paleontology. The process of discovery of new information generates large volumes of data that can be overwhelming if not properly stored and/or utilized. For example, the treatise on invertebrate macrofossils edited by Raymond in 1959 blazed the trail for similar works that came later. Many paleontological volumes provide Description #1: information of fossil specimens that have been formally named. Proximate, acavate cyst, In palynology, problems can arise with palynomorph large (75-95 µm) with irregular classifications and interpretations because of perforated parasutural subjective nature due to human judgments and crests, well developed paratabulation different levels of training. As a result, (including parasulcal paraplates), and the same palynomorph can be precingular interpreted or classified differently, Description #2: archaeopyle. Large proximate, cyst resulting in junior synonyms and with irregular emended descriptions that can perforated crests, well developed potentially confuse students and new paratabulation. researchers. It is important to provide a framework to compose a standardized description of each taxon utilizing diverse observations from various taxonomists. 40 µm Expert Expertinvolvement involvementin increating creating start/stop start/stopword wordlist listcan canimprove improve accuracy accuracyof ofresults results Ifecysta sp. 1 Ifecysta sp. 2 Ifecysta sp. 1 Ifecysta sp. 2 * Diphyes sp. Achomosphaera sp. The main objective of this study is to propose a framework that utilizes text The main objective this studyaistaxon to propose a framework that utilizes text mining techniques inof developing description recommendation system. mining techniques inintelligent developing a taxon descriptionto recommendation Text mining can apply methods/algorithms extract or mine system. Text mining can apply intelligent methods/algorithms to extract or mine knowledge and meaningful data patterns from a large amount of unstructured knowledge and meaningful data patterns from a large of that unstructured texts or documents for decision-making. Therefore, it isamount expected texts or characteristics documents for and decision-making. it is expected that common features fromTherefore, interpretations done by different commoncan characteristics and used features from interpretations done by different scholars be captured and for standardized descriptions to minimize scholars be captured andjudgment. used for standardized descriptions to minimize the issue ofcan subjective human the issue of subjective human judgment. Descriptive terms can be used for: (1) determining fundamental dimensions of a taxon group (2) finding a target dinocyst (3) clustering Diphyes sp. Achomosphaera sp. * COL1, COL2 & COL3 represent the principle components (SVD variables) of the descriptive terms By analyzing different descriptions composed by various scholars, a list of descriptive terms can be generated and used to develop a Four different descriptions for the same dinocyst more complete (standardized) description for an existing or a new dinocyst. As a result, the subjective nature of dinocyst description (human judgment or level of training) can be minimized. 40 µm DinoSys Sample Database DinoSys CHRONOS User Input … Text preprocessing & Pattern Identification Module Variable Reduction Module Clustering Module Model Creation Module Model Evaluation Module Model Selection Module Finding hidden textural patterns for potential technology by grouping similar descriptions Data (Documents) Cleansing Text processing (Parsing, Stop words and start words, Parts of speech, Stemming, and Synonyms, Jargons, Abbreviations) Term Frequency Matrix Weighting Scheme Variable Reduction & Transformation SVD (Singular Value Decomposition): Principal components decomposition. Roll Up Term (use the n terms with the largest term weights) Input (independent variable): The SVD variables Output (dependent variable): Clusters Descriptive models: Regression, Neural network, Decision tree, etc. Model Evaluation & Model Selection Descriptive Model Standardized Taxon Recommendation Modeling Module The Thedescriptive descriptiveterms termsidentified identified from fromthe theclustering clusteringanalysis analysis during duringthe thetext textmining miningprocess process provide provideaacollection collectionofofterms termsthat that are arecommonly commonlyused usedininthe the descriptions descriptionsfor forthe thesamples samplesinin the therespective respectivecluster. cluster. 1.1.Those Thosedescriptive descriptiveterms termsare are analyzed analyzedtotodetermine determine fundamental fundamentaldimensions dimensionsfor foraa taxon taxongroup. group. 2.2.Then, Then,those thosedescriptive descriptiveterms terms and andadditional additionalinformation information gathered gatheredduring duringthe theinvestigation investigation process, process,with withor orwithout withouthuman human intervention, intervention,are areused usedtotosuggest suggest aabasic basicset setofofstandard standardlexicon lexicon for fordomain domainexperts expertstotodevelop developaa standardized standardizedtaxon taxondescription description recommendation. recommendation. Test the proposed framework using Dinosys Database (permission Test the granted) proposed framework using Dinosys Database (permission has been has been granted) Improve result accuracy with expert intervention in creating Improve result accuracy with expert intervention in creating start/stop word list start/stop word list enhanced search engine that take free-form Develop a text mining Develop a text mining search engine that take free-form text (description, terms,enhanced key words, etc.) input and then intelligently text (description, terms, key words, etc.)search input and then intelligently interact, interpret, and translate a user’s intention into interact, interpret, and translate a user’s search intention into suggestions and recommendations suggestions and recommendations Incorporate image and pattern matching to improve efficiency of Incorporate image and pattern matching to improve efficiency of dinocyst search dinocyst search The authors would like to express our gratitude to Dr. Jan Willem The authors would to expressand ourthe gratitude togroup Dr. Janfor Willem Weegink (NGTO, Thelike Netherlands) Dinosys Weegink (NGTO, TheDinosys Netherlands) and to thetest Dinosys grouptofor permission to use the database our ideas, Dr. Lucy permission to use Dinosys database to testand our encouragement, ideas, to Dr. Lucy Edwards at USGS forthe her insightful suggestions at Cervato USGS for insightful suggestions and encouragement, toEdwards Dr. Cinzia at her Iowa State University for her expert critique, to Dr. Cinzia Cervato University for her critique, and to Dr. Martin Head at at Iowa BrockState University, Canada forexpert his support. and to Dr. Martin Head at Brock University, Canada for his support.