Scripts
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsWeb Site ManagementScripts

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 2nd, 2007, 06:44 PM
Porky Porky is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2004
Location: England
Posts: 278 Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 3 Days 11 h 22 m 13 sec
Reputation Power: 15
Screen-scraping certain websites?!

Hi guys and thanks for reading my thread.

I wasnt sure where to post this but here goes. Ive been looking at using VBScript in ASP pages to parse web content.

So far I have created scripts, for a project that I am working on, that can parse the data I require out of the following (example) web addresses:

http://site.forum.betfair.com/jive3/betex/forums.jspa?forumID=25&schatname=#forumanchornull

and

http://horses.sportinglife.com/Meetings/

Ive been using the xmlHTTP object to get a response text, which is then parsed using a few VBScript functions to pick out the text that Id like to store. The scripts work because the data, text, numbers etc that Id like are written in the sourcecode for the page (i.e. right click page ---> view source in I.E)

Where I have come unstuck is that there are other pages that I would like data from but I have absolutely no idea how the page is generated. Take a look at this site (unfortunately I cant provide a link straight to the right page...):

http://sports.betfair.com/

On the menu on the left, navigate to the following:

All Markets --> Soccer --> English Soccer --> Barclays Premiership ---> Winner 2006/2007

It should be possible to see a betting market with premiership teams down the left side, with a table of odds/money in blue and pink squares.

I have viewed the source code for pages like this and I cant seem to work out how the page is created. There are programs and other sites "out there" that I know are getting data from this website, but my XMLresponse method obviously isnt going to work.

I wonder if someone could advise me as to what sort of method or programming language I could use to extract data from a site like this, and also how much effort it would take to write a script that would be able to read the text and numbers from the betting markets.

Thanks to anyone who can offer any help or advice on this and thanks for reading such a lengthy post.

Regards,

Porky

Reply With Quote
  #2  
Old January 4th, 2007, 07:53 AM
Axweildr's Avatar
Axweildr Axweildr is offline
CPAN medic ...
Click here for more information.
 
Join Date: Mar 2003
Location: Location: Location:
Posts: 11,533 Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)Axweildr User rank is General 29th Grade (Above 100000 Reputation Level)  Folding Points: 128389 Folding Title: Super Ultimate Folder - Level 1Folding Points: 128389 Folding Title: Super Ultimate Folder - Level 1Folding Points: 128389 Folding Title: Super Ultimate Folder - Level 1Folding Points: 128389 Folding Title: Super Ultimate Folder - Level 1Folding Points: 128389 Folding Title: Super Ultimate Folder - Level 1Folding Points: 128389 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 4 Months 3 Days 6 h 46 m 27 sec
Reputation Power: 2948
Send a message via Google Talk to Axweildr
Orkut
They're certainly going out of their way to hide the content, but then again it's their job

Have you looked for RSS feeds with this information, any reason you're targetting this particular site?
__________________
--Ax
without exception, there is no rule ...
heavyhaulage.ie
The great thing about Object Oriented code is that it can make small, simple problems look like large, complex ones


09 F9 11 02
9D 74 E3 5B
D8 41 56 C5
63 56 88 C0
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
Detavil - the devil is in the detail, allegedly, and I use the term advisedly, allegedly ... oh, no, wait I did ...

Reply With Quote
  #3  
Old January 4th, 2007, 08:09 AM
Porky Porky is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2004
Location: England
Posts: 278 Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level)Porky User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 3 Days 11 h 22 m 13 sec
Reputation Power: 15
Yes they are. I think the reason is that they are trying to sell people "API applications" that are small programs that retrieve and display the data in a different way, and you have to pay for them.

The reason that I am targetting this particular site is that its a betting exchange rather than a bookies. Its important for the project that I am working on as I will need to be able to both back (I.e. bet that something will win) and lay (bet that something will lose) any selection, and this is on possible to do on a standard betting site, and the odds on betting exchanges are always slightly different to those in a bookies.

The betfair website has an RSS feed but its for market results (as in, football matches, completed horse races etc) rather than for current market prices, which is the bit that I need.

I think Im into finding someone on the betfair developers API forum who could help me out, and just find out how much money they would be expecting to make it for me :/

Hmmmmmm.

Quote:
Originally Posted by Axweildr
They're certainly going out of their way to hide the content, but then again it's their job

Have you looked for RSS feeds with this information, any reason you're targetting this particular site?

Reply With Quote
Reply

Viewing: Dev Shed ForumsWeb Site ManagementScripts > Screen-scraping certain websites?!


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

 Free IT White Papers!
 
How to Present Effectively Online
This white paper offers practical and actionable advice on the key steps that any presenter should consider as they plan and execute a Webinar or online meeting.

 
Open Source Security Myths
Open Source Software (OSS) is computer software whose source code is available to the general public with relaxed or non-existent intellectual property restrictions (or arrangement such as the public domain), and is usually developed with the input of many contributors.

 
Power and Cooling Capacity Management for Data Centers
This paper describes the principles for achieving power and cooling capacity management.

 
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation
For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason: NAS is quite often the simplest, most cost-effective storage approach for Oracle. Learn about the benefits that HP's approach to scalable NAS brings to Oracle environments in this comprehensive white paper.

 
Understanding Web Application Security Challenges
This white paper discusses many common threats and preventive measures for Web application security, and explains what you can do to help protect your organization.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
Stay green...Green IT