Working with Word documents with Python using DOCX library


The python-docx library is a powerful Python library used to create, modify, and extract information from Microsoft Word (.docx) documents programmatically. It allows users to deal with Word documents without the need for Microsoft Word or similar software. This tutorial will cover the basics of how to install the library, create a new document, add various elements (like paragraphs and tables), and modify document styles.

Installing python-docx

Before you can start using python-docx, you need to install it. This can be done via pip. Open your command line (terminal, command prompt, etc.) and run the following command:

pip install python-docx

Creating a New Document

Once python-docx is installed, you can start by creating a new Word document:

from docx import Document
# Create a new Document
doc = Document()
doc.add_paragraph('Hello, world!')
doc.save('helloworld.docx')

This code creates a new document, adds a paragraph with "Hello, world!" as its text, and saves the document to helloworld.docx.

Working with Paragraphs

Adding more complex text, setting styles, and adding runs (segments of text with the same style):

# Add a paragraph with a defined style
paragraph = doc.add_paragraph('This is a heading', style='Heading 1') # Add a run to an existing paragraph
run = paragraph.add_run(' and this is some additional text.')
run.bold = True # Save the updated document
doc.save('updated_document.docx')

Inserting Tables

Tables can be created and filled with data:

# Add a table
table = doc.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Quantity'
hdr_cells[1].text = 'ID'
hdr_cells[2].text = 'Description' # Add more rows to the table
for item in range(2): # example for two additional rows
row_cells = table.add_row().cells
row_cells[0].text = str(item + 1)
row_cells[1].text = f'ID{item + 1}'
row_cells[2].text = 'Description here' # Save the document
doc.save('document_with_table.docx')

Modifying Document Style

You can define and use styles throughout your document for consistent formatting:

from docx.shared import Pt
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT # Add a new style called 'CustomStyle'
style = doc.styles.add_style('CustomStyle', WD_STYLE_TYPE.PARAGRAPH)
style.font.name = 'Arial'
style.font.size = Pt(12)
style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Apply the new style to a paragraph
paragraph = doc.add_paragraph('Centered Text', style='CustomStyle') # Save the document
doc.save('styled_document.docx')

Reading from a Document

python-docx also allows you to read and extract data from documents:

# Open an existing document
doc = Document('existing_document.docx') # Print all the text in the document
for para in doc.paragraphs:
print(para.text)

Recommended Course

Python Mega Course: Learn Python in 60 Days, Build 20 Apps
Learn Python on Udemy completely in 60 days or less by building 20 real-world applications from web development to data science.