Python Text Processing Useful Resources

Python Text Processing - Backward File Reading



When we normally read a file, the contents are read line by line from the beginning of the file. But there may be scenarios where we want to read the last line first. For example, the data in the file has latest record in the bottom and we want to read the latest records first. To achieve this requirement we install the required package to perform this action by using the command below.

pip3 install file-read-backwards 

Example - Reading File Line By Line

But before reading the file backwards, let's read the content of the file line by line so that we can compare the result after backward reading.

main.py

with open ("GodFather.txt", "r") as BigFile:
    data=BigFile.readlines()

# Print each line
	for i in range(len(data)):
    print("Line No- ",i )
    print(data[i])

Output

When we run the above program, we get the following output −

Line No-  0
Vito Corleone is the aging don (head) of the Corleone Mafia Family. 

Line No-  1
His youngest son Michael has returned from WWII just in time to ...

Example - Reading Lines Backward

Now to read the file backwards we use the installed module.

main.py

from file_read_backwards import FileReadBackwards

with FileReadBackwards("GodFather.txt", encoding="utf-8") as BigFile:

# getting lines by lines starting from the last line up
    for line in BigFile:
        print(line)

Output

When we run the above program, we get the following output −

The Don barely survives, which leads his son Michael to begin a violent...

You can verify the lines have been read in a reverse order.

Reading Words Backward

We can also read the words in the file backward. For this we first read the lines backwards and then tokenize the words in it with applying reverse function. In the below example we have word tokens printed backwards form the same file using both the package and nltk module.

main.py

import nltk
from file_read_backwards import FileReadBackwards

with FileReadBackwards("GodFather.txt", encoding="utf-8") as BigFile:

# getting lines by lines starting from the last line up
# And tokenizing with applying reverse()
    for line in BigFile:
        word_data= line
        nltk_tokens = nltk.word_tokenize(word_data)
        nltk_tokens.reverse()
        print(nltk_tokens)

Output

When we run the above program we get the following output −

['.', 'apart', 'family', 'Corleone'..., 'The']
['.', 'men', 'hit', 'his', 'of', 'some', ...'This']
...
Advertisements