Natural language processing tools
Lê Đức Trọng
Crawler and Parser tools
• Crawler tools:
• Crawler 4j:
• httpClient:
• Parser tools:
• htmlParser:
• Jsoup html parser:
• Neko html parser:
Vietnamese NLP – Tools
• JVnTextPro:
• Sentence Segmentation, Sentence Tokenization, Word
Segmentation, POS-Tagging
• VnToolkit:
• An automatic tagger for Vietnamese texts
• A tokenize for automatic word segmentation of Vietnamese texts
• A sentence detector for automatic detecting sentences of
Vietnamese texts
• VLSP Tools:
• Vietnamese Chunking
NLP Toolkits
• LingPipe:
• Find the names of people, organizations or locations in news
• Automatically classify Twitter search results into categories
• Suggest correct spellings of queries
• Mallet - Machine Learning for Language Toolkit:
• Statistic, document classification, clustering, topic modeling, information
• Stanford NLP softwares:
• Word segmentation, part-of-speech tagging, named entity recognition,
chunking, parsing, classification and coreference resolution
• Open source Python modules, linguistic data and documentation for research
and development in natural language processing and text analytics.
• OpenNLP:
• Tokenization, sentence segmentation, part-of-speech tagging, named entity
extraction, chunking, parsing, and coreference resolution
Machine learning libraries
• Conditional random fields (CRF)
• CRF:
• Maximum entropy (Maxent)
• OpenNLP, Mallet
• Support vector machine (SVM)
• libSVM:
• svmLight: