- Python Text Processing - Home
- Python Text Processing - Introduction
- Python Text Processing - Environment
- Python Text Processing - String Immutability
- Python Text Processing - Sorting Lines
- Python Text Processing - Counting Token in Paragraphs
- Python Text Processing - Binary ASCII Conversion
- Python Text Processing - Strings as Files
- Python Text Processing - Backward File Reading
- Python Text Processing - Filter Duplicate Words
- Python Text Processing - Extract Emails from Text
- Python Text Processing - Extract URL from Text
- Python Text Processing - Pretty Print
- Python Text Processing - State Machine
- Python Text Processing - Capitalize and Translate
- Python Text Processing - Tokenization
- Python Text Processing - Remove Stopwords
- Python Text Processing - Synonyms and Antonyms
- Python Text Processing - Translation
- Python Text Processing - Word Replacement
- Python Text Processing - Spelling Check
- Python Text Processing - WordNet Interface
- Python Text Processing - Corpora Access
- Python Text Processing - Tagging Words
- Python Text Processing - Chunks and Chinks
- Python Text Processing - Chunk Classification
- Python Text Processing - Classification
- Python Text Processing - Bigrams
- Python Text Processing - Process PDF
- Python Text Processing - Process Word Document
- Python Text Processing - Reading RSS feed
- Python Text Processing - Sentiment Analysis
- Python Text Processing - Search and Match
- Python Text Processing - Text Munging
- Python Text Processing - Text wrapping
- Python Text Processing - Frequency Distribution
- Python Text Processing - Summarization
- Python Text Processing - Stemming Algorithms
- Python Text Processing - Constrained Search
Python Text Processing Useful Resources
Python - Text Processing Introduction
Text processing has a direct application to Natural Language Processing, also known as NLP. NLP is aimed at processing the languages spoken or written by humans when they communicate with one another. This is different from the communication between a computer and a human where the communication is wither a computer program written by human or some gesture by human like clicking the mouse at some position. NLP tries to understand the natural language spoken by humans and classify it, analyses it as well if required respond to it. Python has a rich set of libraries which cater to the needs of NLP. The Natural Language Tool Kit (NLTK) is a suite of such libraries which provides the functionalities required for NLP.
Below are some applications which use NLP and indirectly python's NLTK.
Summarization
Many times, we need to get the summary of a news article, a movie plot or a big story. They are all written in human language and without NLP we have to rely on another human's interpretation and presentation of such summary to us. But with help of NLP we can write programs to use NLTK and summarize the long text with various parameters, like what is the percentage of text we want in the final output, choosing the positive and negative words for summarization etc. The online news feeds rely on such summarization techniques to present news insights.
Voice Based Tools
The voice-based tools like apples Siri or Amazon Alexa rely on NLP to understand the interaction mad with humans. They have a large training data set of words, sentences and grammar to interpret the question or command coming from a human and process it. Though it is about voice, indirectly it also gets translated to text and the resulting text form the voice is taken through the NLP system to produce result.
Information Extraction
Web scrapping is a common example of extracting data form the web pages using python code. Here it may not be strictly NLP based but it does involve text processing. For example, if we need to extract only the headers present in a html page, then we look for the h1 tag int he page structure and find a way to extract the text between only those tags. This need text processing program from python.
Spam Filtering
The spam in emails can be identified and eliminated by analysing the text in the subject line as well as in the content of the message. As the spam emails are usually sent in bulk to many recipients, even if their subjects and contents have little variation, that can be matched and tagged to mark them as spam Again it needs the use of the NLTK libraries.
Language Translation
Computerized language translation relies heavily on NLP. As more and more languages are used in the online platform, it becomes a necessity to automate the translation from one human language to another. This will involve programming to handle the vocabulary, grammar and context tagging of the languages involved in translation. Again, NLTK is used to handle such requirements.
Sentiment Analysis
To find out the overall reaction to the performance of a movie, we may have to read thousands of feedback posts from the audience. But that too can be automated by using the classification of positive an negative feedback through words and sentence analysis. And then measuring the frequency of positive and negative reviews to find the overall sentiment of the audience. This obviously needs the analysis of the human language written by the audience and NLTK is used heavily here for processing the text.