In previous installments I covered the use of XML databinding. This time around I thought I'd concentrate purely on manipulating content controls. It should be noted that, technically, Custom XML parts days are numbered, and I want to show how to side step this potential problem.

In this instalment, we'll begin to form an infrastructure around Content Controls that we can use for all kinds of handy tricks. It will include:

  • A notion of a 'DocumentModel' - data to drive our document
  • The DocumentBuilder class - this class is responsible for constructing the document at large
  • Replacing Content Controls with Content without XML data binding
  • Use of AltChunk tags to compose templates together

At the end of this post we will use these four concepts to build a simple mail merge engine that will take some data from and turn it into a print run of letters. It will not require Word installed to run, (only to read the output), nor will it use any Custom XML.

If you want to skip the chat, and go straight to the code - here tis... http://code.google.com/p/word-merge/

So we've looked at the Content Control from a high level, but what is it exactly? Well, quite simply it is a cluster of XML inside a WordML document. Specifically, Content Controls are tags. In turn they have a properties tag, and a content tag.

Unfortunately, there is no simple Content Control object provided for in the Open XML SDK. In order to make working with Content Controls easy - we need to roll our own.

Introducing the ContentControl Abstraction

We want to get at content controls quickly and easily, and to do that it would be nice to address them by their tag or title (I chose tag) as we set them in Word. It would also be nice to say "Hey word document, gimme all your content controls!". As it turns out, that requires a bit of code. Content controls are represented by many different types in the Open XML API depending on their placement in the DOM. Luckily, all of these types derive from a base called OpenXmlCompositeElement:

 

[caption id="attachment_827302" align="aligncenter" width="273" caption="All Content Controls derive from OpenXmlCompositeElement"][/caption]

 

Of course, there are other composite elements used in a word document, the distinguishing feature of content controls is that their first element is a set of properties; Tag, Title, et al. We can leverage this fact to provide a useful abstraction.



Building a document

Building a single data driven document is pretty easy, as long as we follow some conventions. I use what roughly equates to an MVC pattern to begin with. For each type of document I want to build, I have a triad of elements; a Word document forms the Template, a class that represents all the data that will go into the template (I call it a ViewModel so I can hang with the cool kids) and a class that takes the data and throws it at the template and saves the result.

Where the convention kicks in is that I make sure the names of the properties on ViewModel are the same as the tagnames in the template. Can you see where I'm going with this?

So lets have a look at our template,

 

[caption id="attachment_827306" align="aligncenter" width="500" caption="A simple template"][/caption]

 

Here is the matching ViewModel...

[sourcecode language="csharp"]
public class ArrearsModel
{
public string Address { get; set; }
public string Salutation { get; set; }
public string ArrearsAmount { get; set; }
}
[/sourcecode]

As you can imagine, the document builder can now use reflection to map from the properties on a view model to the content controls on the page. This is where our ContentControlInfo abstraction comes in handy - we can find content controls really easily now. Imagine something like this inside your builder class:

[sourcecode language="csharp"]

private void BindModel(ArrearsModel model)
{
//more on this later ;)
var allContent = Document.GetContentControls();

var modelProperties = typeof (ArrearsModel).GetProperties();
foreach (var modelProperty in modelProperties)
{
var value = modelProperty.GetValue(model, null);
var name = modelProperty.Name;
var matchingContentControls = allContent.Where(cc => cc.Tag == name);

foreach (var contentControl in matchingContentControls)
contentControl.OverwriteText((string) value); //see code for details
}
}
[/sourcecode]

The hard part here is cracking open a document and looking at its content controls. For that I wrote this simple extension method on the WordprocessingDocument object, which represents the entry point into the ContentControlInfo abstraction:

[sourcecode language="csharp"]
public static IEnumerable GetContentControls(this WordprocessingDocument doc)
{
var rootElement = doc.MainDocumentPart.Document;

var contentControls =
from sdt in rootElement.Descendants()
let properties = sdt.GetFirstChild()
let content = sdt.GetFirstChild<SdtContentBlock>()
where properties != null
select new ContentControlInfo(sdt);

return contentControls;
}
[/sourcecode]

From this simple extension we can now leverage LINQ to give us a variety of ways of collecting Content Controls. For instance:

[sourcecode language="csharp"]
//get a single content control
Document.GetContentControls().Single(cc => cc.Tag == "MyControl");
//get a bunch of like controls
Document.GetContentControls().Where(cc => cc.Tag.StartsWith == "FooTable");

[/sourcecode]

Composing documents with altChunk

Our next move is to insert one document into another. Once we have a built a document, complete with data, we can plonk it into another document using the altChunk tag. In essence the altChunk tag acts as a placeholder, and you stream the actual data into another portion of the document package, the altChunk tag references this location and at runtime the rendering engine (usually Word) expands the altChunk tag with the datastream.

Since we can't use Word to insert altChunk tags I use a content control to mark where I want the altChunk tag to go. The following code replaces a placeholding Content control with an altChunk tag:

[sourcecode language="csharp"]
var doc = placeholder.Ancestors<Document>().Single();
var altChunk = GetAltChunk(doc, altChunkId, pathToContent);
placeholder.InsertAfterSelf(altChunk);
placeholder.Remove();

private static AltChunk GetAltChunk(Document doc, string altChunkId, string pathToContent)
{
var mainPart = doc.MainDocumentPart;
var chunk =
mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML,
altChunkId
);
using (var fs = File.OpenRead(pathToContent)) chunk.FeedData(fs);

var altChunk = new AltChunk() { Id = altChunkId };
altChunk
.AppendChild(new AltChunkProperties())
.AppendChild(new MatchSource() { Val = true });
return altChunk;
}
[/sourcecode]

Bringing it all together

So the final step in the process is to create a container document for our merge, and get its builder to take care of iterating over a list of ViewModels to create each document in the mail merge. This will result in one great big word document with our merge results. Alternatively you could skip this step and just pump out individual documents.

The following is how I generally achieved this end for our arrears scenario:

[sourcecode language="csharp"]
var placeHolder = Document.GetContentControls().Single(cc => cc.Tag == "Page");

for (int index = 0; index < Model.Count; index++)
{
var arrearsModel = Model[index];
var tempfile = Path.GetRandomFileName();

using (var builder = new ArrearsBuilder(arrearsModel))
builder.Build(tempfile);
placeHolder.Self.ReplaceContent(tempfile, "arrears" + index);
File.Delete(tempfile);
}

placeHolder.Self.Remove();
[/sourcecode]

 

[caption id="attachment_827326" align="aligncenter" width="500" caption="The final output"][/caption]