- Python Text Processing - Home
- Python Text Processing - Introduction
- Python Text Processing - Environment
- Python Text Processing - String Immutability
- Python Text Processing - Sorting Lines
- Python Text Processing - Counting Token in Paragraphs
- Python Text Processing - Binary ASCII Conversion
- Python Text Processing - Strings as Files
- Python Text Processing - Backward File Reading
- Python Text Processing - Filter Duplicate Words
- Python Text Processing - Extract Emails from Text
- Python Text Processing - Extract URL from Text
- Python Text Processing - Pretty Print
- Python Text Processing - State Machine
- Python Text Processing - Capitalize and Translate
- Python Text Processing - Tokenization
- Python Text Processing - Remove Stopwords
- Python Text Processing - Synonyms and Antonyms
- Python Text Processing - Translation
- Python Text Processing - Word Replacement
- Python Text Processing - Spelling Check
- Python Text Processing - WordNet Interface
- Python Text Processing - Corpora Access
- Python Text Processing - Tagging Words
- Python Text Processing - Chunks and Chinks
- Python Text Processing - Chunk Classification
- Python Text Processing - Classification
- Python Text Processing - Bigrams
- Python Text Processing - Process PDF
- Python Text Processing - Process Word Document
- Python Text Processing - Reading RSS feed
- Python Text Processing - Sentiment Analysis
- Python Text Processing - Search and Match
- Python Text Processing - Text Munging
- Python Text Processing - Text wrapping
- Python Text Processing - Frequency Distribution
- Python Text Processing - Summarization
- Python Text Processing - Stemming Algorithms
- Python Text Processing - Constrained Search
Python Text Processing Useful Resources
Python Text Processing - Chunk Classification
Classification based chunking involves classifying the text as a group of words rather than individual words. A simple scenario is tagging the text in sentences. We will use a corpus to demonstrate the classification. We choose the corpus conll2000 which has data from the of the Wall Street Journal corpus (WSJ) used for noun phrase-based chunking.
First, we add the corpus to our environment using the following command.
>>>import nltk
>>>nltk.download('conll2000')
Lets have a look at the first few sentences in this corpus.
from nltk.corpus import conll2000
x = (conll2000.sents())
for i in range(3):
print(x[i])
print('\n')
Output
When we run the above program we get the following output −
['Confidence', 'in', 'the', 'pond', 'is', 'widely',...] ['Chancellor', 'of', 'the', 'Excheqer', 'Nigel', 'Lawson', ...] ['Bt', 'analysts', 'reckon', 'nderlying', 'spport', 'for', ...]
Next we use the fucntion tagged_sents() to get the sentences tagged to their classifiers.
from nltk.corpus import conll2000
x = (conll2000.tagged_sents())
for i in range(3):
print(x[i])
print ('\n')
Output
When we run the above program we get the following output −
[('Confidence', 'NN'), ('in', 'IN'), ...]
[('Chancellor', 'NNP'), ('of', 'IN'), ...]
[('Bt', 'CC'), ('analysts', 'NNS'), ...]
Advertisements