Python is a quite simple and powerful programming language in the sense that it can be applied to so many areas like Scientific Computing, Natural Language Processing but one specific area of application of Python which I found quite fascinating is => Doing Text Processing Using Python.
In this article, I’ll discuss How to find most frequently used word in a Text File using Python Programming Language.
Let’s see what steps need to be followed for Finding out most commonly used word in a Text File
- Open txt file for reading inside Python Code using open(filename, “r”) Function
- Read text inside File Object returned by open(filename, “r”) Function in Step 1, using read() Function
- Split up text contained by Object returned by read() Function from Step 2, using split() Function
- split() will break text from spaces and store words in a Python List
- Import Counter Function from collections module using from collections import Counter statement
- Pass list returned by Step 4 to Counter Function as Counter(list)
- Call most_common() Function on Counter Object returned from Step 6
- Calling most_common() Function on Counter Object will return a list where all of elements are tuples
- First tuple in list will be containing (most common word, times it occurred in file)
Let’s put together all of these 9 steps as Python Code for Finding Most Commonly used word in Text File.
from collections import Counter f = open("somefile.txt", "r") data = f.read() words = data.split() counting = Counter(words) list_of_word_and_frequency = counting.most_common() print("Most Commonly used word in Text File is => ", counting.most_common()) print("Frequency of most commonly used word in Text File is => ", counting.most_common())
Let’s use this code for finding most commonly used word in a text file => somefile.txt which contains text as below.
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.
Using Python Code mentioned above this file’s most commonly used word is “and” and it’s frequency is 6.
Moreover you can use this program for fairly large file sizes and still this program will have good performance. As this program is not importing any external dependency which may be implemented using some complex Data Structure. But as this program is just only using Python’s built-in functions(Counter(), open()) that’s why this program is quite efficient.
But in case, you need to find most commonly used word from quite a large file of size like 1 GB or more than using Numpy, Pandas Python library would be a better option.
I hope that this article helped you to understand How Python can be used for finding most commonly used word in a Text File.
Happy Coding 🥳 🥳