Open a Microsoft Word File with Code

Text to Speech: For C# and VB NET Students

 

We've opened text files and PDF files in previous lessons, and had our Speech Program read them out. We'll open up a Word file in this lesson, grab the text, and have our Synthesizer read it out.

To open up a Word file, you need to add a reference to the Microsoft Office Library. We did something similar for the Excel Charts project.

Right-click on Reference again in the Solution Explorer and select Add Reference. From the Reference Manager dialog box, click on COM on the left. Scroll down and locate the Microsoft Word Object Library. Check the box for whatever is the latest version you have, 16.0 in the image below:

The Reference Manager dialog box in Visual Studio with the Word Object Library highlighted

Click OK to add the reference. You should see items called Microsoft.Office.Core and Microsoft.Office.Interop.Word appear in your list of references:

The Solution Explorer in Visual Studio showing Office.Core and Office.Interop.Word highlighted

In your coding window, for C# coders, add this using statement to the top of your code:

using Microsoft.Office.Interop.Word;

VB Net coders add a new Imports line at the top. Add this:

Imports Microsoft.Office.Interop.Word

We can open up Word now and read the text of a document.

Add a new Sub/method to your code. In C# add this method:

private void GetWorDFile(string filePath)
{
}

And add this Sub in VB Net:

Private Sub GetWorDFile(filePath As String)
End Sub

As the first line of code for this new Sub/method, clear the text box with this line (delete the semicolon on the end in VB):

txtSpeechText.Text = "";

For the next line, we need to create a new Word object. In C#, you need to add this very long line:

Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();

In VB Net, however, you can just add this shorter version of the C# line:

Dim word As Application = New Application()

The reason why you can't add a shorter line in C# is because of a clash of Namespaces. If you tried to add this:

Application word = new Application()

You'd get red underlined for Application. The error is because there's a clash between Word.Application and Windows.Forms.Application. C# doesn't know which one it's supposed to use. Seems strange that this is not an error in VB!

We need to create a new document object now. This object will be created from the Word document that you open from the file dialog box. In C#, add this line:

Document doc = word.Documents.Open(filePath);

And this one in VB Net:

Dim doc As Document = word.Documents.Open(filePath)

At this stage, you could just dump the entire file into the text box with this line:

txtSpeechText.Text = doc.Content.Text.ToString();

There is a problem with this, however. It does work, but examine the image below:

A Word file in a Windows Forms text box with the paragraph indents left in

The gaps in the text are where the paragraphs are. It's kept all of the indents.

If you want to delete all of the indents then you can count the paragraphs in the document, then loop round and trim the text.

To do this, add this line in C#:

int paras = doc.Paragraphs.Count;

And this one in VB Net:

Dim paras As Integer = doc.Paragraphs.Count

We can use the paras variable to loop round and grab each paragraph. Add this loop to your code in C#:

for (int i = 1; i <= paras; i++)
{
}

And this in VB Net:

For i = 1 To paras
Next

Here's the first line of code to place in your loop:

C#:

string temp = doc.Paragraphs[i ].Range.Text.Trim();

VB Net:

Dim temp As String = doc.Paragraphs(i).Range.Text.Trim()

The Paragraphs part after the equal sign is a property that holds how many paragraphs are in the documents. This is an array you can use your loop counter on. The loop counter, i, starts at 1 because Word starts counting Paragraphs at 1. You then need Range.Text.Trim() to get rid of the indents.

The final line of the loop to add puts the newly trimmed paragraph into the text box:

C#

txtSpeechText.Text += temp + "\r\n";

VB

txtSpeechText.Text += temp + vbNewLine

If you did the section on Excel charts you'll know that COM objects like Word need to be cleaned up properly. Add the following using statement to the top of your code in

C#

using System.Runtime.InteropServices;

And this Imports statement to the top of your code in VB Net:

Imports System.Runtime.InteropServices

Back in your GetWorDFile Sub/method, and just after the loop, add this cleanup code in C#:

if (doc != null)
{

doc.Close();
Marshal.ReleaseComObject(doc);

}

if (word != null)
{

word.Quit();
Marshal.ReleaseComObject(word);

}

And this in VB Net:

If doc IsNot Nothing Then

doc.Close()
Marshal.ReleaseComObject(doc)

End If

If word IsNot Nothing Then

word.Quit()
Marshal.ReleaseComObject(word)

End If

Now add the calling line to the if statement of your Open File button. When you're done, your code should look like this in C#:

C# code to open up a Word file and strip out the paragraph indents

And this in VB Net:

VB Net code to open up a Word file and strip out the paragraph indents

If you were to run the code now, the text in the text box would look like this:

A Word file in a Windows Forms text box with the paragraph indents stripped out

All those indents are gone!

Have a go for yourself, though. Open up a Word file on your computer. Select a voice from the dropdown list and click your Speak button. Your Word file will be read out to you.

The next thing we'll do is to get that button working, Pronounce Highlighted Word.

The Pronounce Button >>

Back to the C# NET Contents Page

Back to the VB NET Contents Page