I am working on a Microsoft .NET Application in C# for Web Harvesting, Web Scraping, Web Data Extraction, Screen Scraping, etc. whatever you want to call it. For parsing HTML, I'm attempting to incorporate HTML Agility Pack but it's not as easy as I thought it would be. I have included some specifications of what I am trying to do, and was hoping to get your opinions on how I could proceed?

Specifications:

My goal is to make a very user friendly point-and-click application for downloading data and images from the web. I would like to load HTML pages using the web browser, and output the parsed data and image links into the text box. The user can specify which HTML tags they want and then download the data into the grid. Finally, export the data into whatever format they need.

1. Make HTTP requests to the website and pull down the markup from URL.
  • Class WebClient
  • Class HttpWebRequest
  • Class HttpWebResponse


2.Parse HTML and output data and image links into text editor
  • HTML Agility Pack
  • Xpath


3.Store Data in different formats
  • Microsoft Excel and Access
  • Databases (MySQL)
  • Text


Thanks in advance for your ideas! I have some screenshots of the application but unfortunately I can't post them since I'm new.