Python is a quite simple and powerful programming language in the sense that it can be applied to so many areas like Scientific Computing, Natural Language Processing but one specific area of application of Python which I found quite fascinating is => Doing Text Processing Using Python.
In this article, I’ll discuss How to calculate number of times a word occur in a text file using Python?
Let’s see what steps need to be followed for calculating frequency of a Word in a Text File
- Open txt file for reading inside Python Code using open(filename, “r”) Function
- Read text inside File Object returned by open(filename, “r”) Function in Step 1, using read() Function
- Split up text contained by Object returned by read() Function from Step 2, using split() Function
- split() will break text from spaces and store words in a Python List
- Use a counter like occurrence_of_word = 0 and iterate over Python List returned from Step 4, increment counter if word matches with specified word whose frequency is to be calculated
- After iterating over whole of Python List of words, number occurrence_of_word will be Frequency of specified word in Text File
Let’s put together all of these 6 steps as Python Code.
f = open("testing.txt", "r") # Step 1 data = f.read() # Step 2 words = data.split() # Step 3 and 4 # Specified Word whose frequency is to be calculated count_frequency_word = "Python" occurrence_of_word = 0 for i in words: # Step 5 if i.lower() == count_frequency_word.lower(): occurrence_of_word += 1 else: pass print(occurrence_of_word)
Let’s use this Python Code for calculating How many times does word “Python” occurred in filename.txt file, this txt file contains text as below.
Python is a Simple Language Python have simple syntax as compared to other Languages Python is easy to learn
Running above code by passing in filename.txt and “Python” word as count_frequency_word, will return 3 as word Python occurs three times in text file.
This Python Code is considering words python/Python/PYTHON/PYthON all same and would be counted when calculating frequency of word Python.
But if you explicitly want to consider these python/Python/PYTHON/PYthON as different words and for Python word you want to only calculate the frequency of Python word and not other words(Python/PYTHON/PYthON).
Then on line 9 in above Python code remove lower() and write it as i == count_frequency_word only.