Generating Word Documents - Part 2: Simple Databinding
In Part 1 we covered the problem of generating word documents from a high level perspective. In this instalment, I'll walk through some lower level concepts. I'll show how Content Controls can be used to inject data into our templates.
A Simple Template
Depending on the source, this can either be the most tedious, or most pleasurable part of the pipeline. Upon receiving a 20-year-old document that was probably converted from WordStar 1.0 back in the summer of '87, you'll need to weigh up the benefits of cleaning it up or starting from scratch. (More on automating that in a future episode.)
So what are Content Controls? Basically they are a handy placeholder that serve a dual purpose. You can validate manual input into them, through schema binding (handy for Sharepoint documents), another feature is that you can bind data to them using embedded XML and XPath.
Creating Content Controls is easily done; the first step is ensuring the Developer tab is on, and for extra visibility, turn on the design mode.
[caption id="attachment_827126" align="alignnone" width="500" caption="The Office Ribbon, Showing the Developer Tab. The 'Rich Text Content Control' & 'Design Mode' buttons are highlighted."][/caption]
You can click the 'Rich Text Content Control' icon (highlighted in the above picture) to create placeholders for data in your document.
Personally, I like to use keyboard short-cuts where possible; in this case Alt+L, Alt+Q will create a custom control at the current selection and Alt+L, Alt+L will access a selected controls properties, shown below...
[caption id="attachment_827129" align="alignnone" width="440" caption="The Property Page for the Rich Text Content Control."][/caption]
After a few minutes you can end up with a pretty rich document. For this post, my example document looks like this:
Have Data. Will Databind.
Now that we have place holders for data, we need to get data into them. The easiest way to do this is with the Content Control Toolkit. Close the template from Word and open it in this application. You will be presented with a list of the content controls you created in the template.
Notice that the right hand pane is largely dedicated to the addition and edition of Custom XML parts. We will be inserting our data into the document via this tool. For starters, lets hand craft some XML satisfy this documents needs. Click the 'Create a new Custom XML Part' link under Actions, switch to the edit tab and enter the following XML:
<title>Hello Content Controls</title>
<subtitle>Testing content controls in MS Office Word</subtitle>
Cpt. James T. Kirk
<address>No fixed address</address>
Now switch to the bind tab, and start dragging the data points across to the relevant content controls with your mouse. This will decide the correct XPath required to bind to the data and update the content controls properties. Alternatively, if you dream in XPath then you can double-click each content control on the left and enter the XPath in directly. The finished result will look like this:
Upon saving and exiting, we can reopen our template and find that our content controls are now bound to our design time data:
That really is the most simplest example I can show you - unfortunately, doing anything more complicated than this requires code. As you have probably surmised, that really means; to do anything useful - you need to write code.
However, we now have a usable template, that has not only 'design time' data embedded into it, but has XPath expressions attached to our content controls such that we can throw runtime data at it.
The point of the exercise is to create these things in a programmatic fashion. For various reasons, including revision history, template management & automated testing, I decided to store my templates in a class library and write a small framework to manage the life-cycle of a document: from being sourced on the filesystem, to being injected with Xml data, to being rendered to a file. In Part 1 I eluded to a DocumentBuilder Class.
To keep things simple for now, I'll show you the bare minimum, using the Open XML API. Note that I've referenced WindowsBase.dll and DocumentFormat.OpenXml.dll.
The following example code injects some static XML into the word document at runtime:
static void Main(string args)
var template = "Example.docx";
var outputFile = "HelloContentControls.docx";
//copy the file to the output location
File.Copy(template, outputFile, true);
//open the 'package'
using (var wpd = WordprocessingDocument.Open(outputFile, true))
//get at the document part of the package
var mainDoc = wpd.MainDocumentPart;
//get rid of design time data
//create runtime data
var data = XElement.Parse(@"
The creation of TPS reports is SERIOUS!
<name>Mr. John Smith</name>
<address>12 Capital Hill, Canberra</address>
"); //this could just as easily be a serialized object..
//create a new customXmlpart
var xmlPart = mainDoc.AddCustomXmlPart(CustomXmlPartType.CustomXml);
//stream data into the part
using (var partStream = xmlPart.GetStream(FileMode.Create, FileAccess.Write))
using (var outputStream = new StreamWriter(partStream))
This console app copies our template to a destination, cracks it open and fills it with our runtime data and stitches things back up again. Our output looks like a 'real' document:
The code is a lot simpler than the COM Interop code for office in my opinion, and the fact that it doesn't leave instances of word hanging around in the background is a real bonus.
I will be following this post up with examples of how to tackle trickier topics like tables, repeating sections and showing/hiding sections of content, and how to make this all happen without violating every OO design principle in the book.