Skip to content
Advertisement

Python-docx: Find and replace all placeholder numbers in Word doc with random numbers

I’m having trouble finding and replacing all occurrences of several placeholders within paragraphs of a Word file. It’s for a gamebook, so I’m trying to sub random entry numbers for the placeholders used while drafting the book.

All placeholders begin with “#” (e.g. #1-5, #22-1, etc.). Set numbers, like the first entry (which will always be “1”), don’t have the “#” prefix. Placeholder entries are paired with random counterparts as tuples by zipping within a tuple for reference.

It all works great for the headings since it’s a straight up one-for-one swap of paragraph, in order. The trouble is when I’m iterating through the regular paragraphs (second to last bit of code). It only seems to replace the first eight numbers, then stops. I’ve tried setting up a loop but it doesn’t seem to help. Not sure what I’m missing. Code follows.

Edit: Here’s how the two lists and reference tuple are set up. In this test, only the first entry is set, with no in-paragraph reference back to it. All the others are to be randomized and replaced in-paragraph.

entryWorking: [‘#1-1’, ‘#1-2’, ‘#1-3’, ‘#1-4’, ‘#1-5’, ‘#1-6’, ‘#1-7’, ‘#1-8’, ‘#2’, ‘#2-1’, ‘#2-2’, ‘#2-3’, ‘#2-4’, ‘#2-5’, ‘#2-6’, ‘#2-7’, ‘#16’, ‘#17’, ‘#3’, ‘#3-1’, ‘#3-2’, ‘#3-3’, ‘#3-4’, ‘#3-5’, ‘#3-6’, ‘#3-8’, ‘#3-9’]

entryNumbers: [‘2′, ’20’, ’12’, ’27’, ’23’, ‘4’, ’11’, ’16’, ’26’, ‘7’, ’25’, ‘5’, ‘3’, ’15’, ’17’, ‘6’, ’18’, ’22’, ’10’, ’21’, ’19’, ’13’, ’28’, ‘8’, ’14’, ‘9’, ’24’]

reference: ((‘#1-1’, ‘2’), (‘#1-2′, ’20’), (‘#1-3′, ’12’), (‘#1-4′, ’27’), (‘#1-5′, ’23’), (‘#1-6’, ‘4’), (‘#1-7′, ’11’), (‘#1-8′, ’16’), (‘#2′, ’26’), (‘#2-1’, ‘7’), (‘#2-2′, ’25’), (‘#2-3’, ‘5’), (‘#2-4’, ‘3’), (‘#2-5′, ’15’), (‘#2-6′, ’17’), (‘#2-7’, ‘6’), (‘#16′, ’18’), (‘#17′, ’22’), (‘#3′, ’10’), (‘#3-1′, ’21’), (‘#3-2′, ’19’), (‘#3-3′, ’13’), (‘#3-4′, ’28’), (‘#3-5’, ‘8’), (‘#3-6′, ’14’), (‘#3-8’, ‘9’), (‘#3-9′, ’24’))

Thanks for the assist.

import sys, os, random
from docx import *

entryWorking = [] # The placeholder entries created for the draft gamebook


# Identify all paragraphs with a specific heading style (e.g. 'Heading 2')
def iter_headings( paragraphs, heading ) :
    for paragraph in paragraphs :
        if paragraph.style.name.startswith( heading ) :
            yield paragraph


# Open the .docx file
document = Document( 'TestFile.docx' )


# Search document for unique placeholder entries (must have a unique heading style)
for heading in iter_headings( document.paragraphs, 'Heading 2' ) :
    entryWorking.append( heading.text )


# Create list of randomized gamebook entry numbers
entryNumbers = [ i for i in range( len ( entryWorking ) + 1 ) ]

# Remove unnecessary entry zero (extra added above to compensate)
entryNumbers.remove( 0 )

# Convert to strings
entryNumbers = [ str( x ) for x in entryNumbers ]


# Identify pre-set entries (such as Entry 1), and remove from both lists
# This avoids pre-set numbers being replaced (i.e. they remain as is in the .docx)
# Pre-set entry numbers must _not_ have the "#" prefix in the .docx
for string in entryWorking :
    if string[ 0 ] != '#' :
        entryWorking.remove( string )
        if string in entryNumbers :
            entryNumbers.remove( string )

# Shuffle new entry numbers
random.shuffle( entryNumbers )


# Create tuple list of placeholder entries paired with random entry
reference = tuple( zip( entryWorking, entryNumbers ) )


# Replace placeholder headings with assigned randomized entry
for heading in iter_headings( document.paragraphs, 'Heading 2' ) :
    for entry in reference :
        if heading.text == entry[ 0 ] :
            heading.text = entry[ 1 ]


# Search through paragraphs for placeholders and replace with randomized entry
for paragraph in document.paragraphs :
    for run in paragraph.runs :
        for entry in reference :
            if run.text == entry[ 0 ] :
                run.text = entry [ 1 ]

                        
# Save the new document with final entries
document.save('Output.docx')

Advertisement

Answer

In Word, runs break at arbitrary locations in the text:

You might be interested in the links in this answer that demonstrate the (surprisingly complex) work required to do this sort of thing in the general case:

How to use python-docx to replace text in a Word document and save

There are a couple of paragraph-level functions that do a good job of this and can be found on the GitHub site for python-docx.

This one will replace a regex-match with a replacement str. The replacement string will appear formatted the same as the first character of the matched string.

This one will isolate a run such that some formatting can be applied to that word or phrase, like highlighting each occurence of “foobar” in the text or perhaps making it bold or appear in a larger font.

Fortunately it’s usually copy-pastable with good results :)

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement