Get Text from URL

Text to Speech: For C# and VB NET Students

 

A nice touch would be the opportunity to download a web page and have the text on that web page appear in the text box. Once it's in the text box, it can be read out to you.

Unfortunately, grabbing just the text from a web page is tricky stuff (not to mention chancy from a copyright perspective). But it can be done. There's some code here buy a guy called Jake Drew:

https://www.codeproject.com/Articles/587458/GettingplusOnlyplusTheplusTextplusDisplayedplusOnp

(The code's a bit old, now, and Microsoft might be moving away from the WebBrowser control. But it's still worth looking at.)

I've made a few changes to Jake's code and wrote a VB Net version. Here's the C# version:

C# code to get text from a URL for a Text to Speeh program

And here it is in VB Net:

Code to get text from a URL for a Text to Speeh program using Visual Basic Net

Try it out for yourself. Add a button and a text box to your form. Change the text box name to txtUrl. Copy and paste the following web address as the Text property for the text box:

https://www.homeandlearn.co.uk/NET/nets11p1.html

Double click your button and add the code from the image above. Test it out and you should find it works OK. You can load the text of a web page and have it read out.
There are problems, however. Text from links in the page still appear. For example, the A HREF twitter links are nicely removed. But the text that goes between the tags is still there. So if we had this in the HTML:

<A HREF="twitter_link.html>Twitter</A>

All the tag parts would be removed. But you'd still have the word Twitter left over, though.

If you want to explore further then try opening up the NuGet package manager again from the Tools menu in Visual Studio. Make sure you're on the Browse page and enter HTML Parse as the search text. There's plenty of parsers about, but the documentation is hit and miss. And it's not exactly an easy subject.

 

But that's it for the Text to Speech program. Hope you enjoyed it!

Back to the C# NET Contents Page

Back to the VB NET Contents Page