Python can be used for converting Docx File to a Text File using docx2txt package which is developed by Ankush Shah. This package contains process() function which takes in a docx file as parameter and then parse it to a text file.
Let’s see step-by-step How Docx File can be converted to Text File using Python
- First install docx2txt package using pip (pip install docx2txt)
- Import docx2txt package into Python Code file using import docx2txt statment
- Pass docx file to function docx2txt.process(parameter) as a parameter
- Save value returned by docx2txt.process(parameter) using open() function as txt file
Just note here pip install docx2txt will install this package for Python2.7 on Mac Laptops. And if you run your program using Python3 then it will show ModuleNotFoundError: No module named ‘docx2txt’ Error. To avoid this situation, install this package for Python3 version using python3 -m pip install docx2txt, this will specifically install docx2txt for latest Python3 version.
Below is a Python Code Example showing How a Docx File(For example => test.docx) can be converted to a text file (For example => output.txt)?
import docx2txt # Passing docx file to process function text = docx2txt.process("test.docx") # Saving content inside docx file into output.txt file with open("output.txt", "w") as text_file: print(text, file=text_file)
Below is a picture of test.docx and resulted output.txt file.
From above pictures of test.docx file and output.txt file, it’s clearly visible that docx2txt package is doing good work in converting text inside docx file to simple txt file format. But it’s not properly indented like in docx file.
If you interested, you can check docx2text package source code on github – Python Docx2txt Github Source Code