Visual Basic Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming Languages - MoreVisual Basic Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 21st, 2003, 12:11 PM
milty456 milty456 is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 4 milty456 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
VB6 HTML parse

I want to make an app that can search for specific text in an HTML file and either delete it, replace it, or append to the current.

I have no clue how to work with these files or read and write to them line by line.

Can anyone help me

Reply With Quote
  #2  
Old December 22nd, 2003, 12:52 AM
cleverpig cleverpig is offline
Contributing User
Dev Shed Beginner (1000 - 1499 posts)
 
Join Date: Jul 2003
Posts: 1,152 cleverpig User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 7
Send a message via MSN to cleverpig
U can use filesystem object in vb!Look MSDN to learn- http://msdn.microsoft.com/library/d...ystemobject.asp It's easy to use!

Reply With Quote
  #3  
Old January 3rd, 2004, 09:30 PM
robbage robbage is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2003
Location: Perth, Western Australia
Posts: 30 robbage User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 5
Send a message via ICQ to robbage
You can open an HTML file as plain text (since thats all it is anyway) but finding 'words' will be a nightmare since 'words' can be represented in a few ways, and can be disguised in may more. (eg the many ways that Spam HTML does it.. esp for rude words & key words that anti-spam programs look for...)

You can InStr() for the word "rude" but the letters in the word can be expressed using &#xxx; or the word can be split up using comments like this:

ru<!--comment-->de

this will display in a browser as 'rude' since the comment is ignored by browsers. Also words can be split by putting a carriage-return in the middle of it since browser ignore carriage-returns

You will need to take into account every single way of word masking. You can preprocess the text first by strippoing out comments and carriage-returns, converting &#xxx; to their character values, and the other ways that Spammers come up with

eg inserting 1x1 pixel image in the middle of a word, converting the text to an image (no way to get around that except to remove all images!)

The list goes on and on...

Also, you shouldn't work with HTML files 'line-by-line' since end-of-lines are not required in HTML. A whole web page with images and text can be written on one (very long) line.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreVisual Basic Programming > VB6 HTML parse


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway