Parsing the Email

Receive Email Walkthrough: for C# and VB NET Students

 

In the previous lesson, we download an email and placed the whole thing in a text box. We had this:

And email retrieved with the RETR command

In the email above, we placed both the email Header and the email Body all in the same text box. You can, however, parse the email and separate the two. However, this can get quite complicated and email parsers have been written to do this, like mimekit and emailarchitect. We're going to do something a lot easier.

What we'll do is to search for two line feed characters in the email. This tells you where the email headers and email body are. You search for the carriage returns and newline characters. In VB Net, you can do this by searching for vbCrLf. In C#, you can search using "\r\n". Here's some code to add, just after your while loop in your GetEmail Sub/method:

VB Net:

Dim lineFeeds As Integer
lineFeeds = InStr( 1, TextLine, vbCrLf & vbCrLf )

C#

int lineFeeds = textLine.IndexOf("\r\n" + "\r\n");

If you remember your string manipulation techniques, InStr is short for "In String", and is used to find the position of one string inside of another. In the code above we're searching for the two linefeed characters in the TextLine string. We're starting at position 1. If the string is found, lineFeeds will contain the position number of the string. If the string is not found then 0 will be returned. We can use this to separate the Headers from the Body. Add the highlighted lines in the images below to your GetEmail code:

VB Net:

VB NET code to separate an email header from the body

C#

C# code to separate an email header from the body

If you wanted to, you could add another text box to your form. Then display the header as well as the body:

txtHeader.Text = emailHeader
txtEmail.Text = emailBody

If you just want to get the Header then there is another POP3 command available on some, if not most, servers: TOP.

The TOP command returns the Header of the email and an optional number of lines. It is used like this:

TOP 1 2

The first number after TOP is the message number. The second number is how many lines from the Body of the email you want. In the code above, we want the Header from email 1, plus 2 lines from the Body. Both numbers are required. If you don't want any lines from the body, you'd have this:

TOP 1 0

That's not TOP ten, it's TOP followed by a 1 and then a 0. A space separates the two. You can use your PopCommand function to try it out.

One thing you may want to do is to get at the Subject line of the Header. You can then decide if it's likely to be spam, and delete it. By using TOP, you're not sifting through the entire email, but just the Header. (To get at the Subject, search each line for "Subject:" and place the returned value into a text box. But this adds another layer of complexity, so we've left it out.)

Parsing emails can be very difficult, which is why it's a good idea to use something like MimeKit.

One other thing you can do with the whole of the email in the text box, headers and body together, is to save the file, not as a text file but with the ending eml. That way, it's an email file than can be parsed with other software.

 

In the next lesson, you'll see how to delete an email from the server.

Delete an email form the server >>

Back to the C# NET Contents Page

Back to the VB NET Contents Page