#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2011
    Posts
    2
    Rep Power
    0

    Extracting data from a XML website


    Sorry, this may be a FAQ.
    Anyway I didn't find an answer using the search function.


    What will be the best way to extract dynamically changing information from a web page that holds XML?

    I have to extract some data from multiple web pages.
    These sites contain XML code that dynamically reloads (and changes) parts of the page.
    Since only small parts get a refresh (perhaps 200 bytes every 5 seconds) it wouldn't be efficient (and also might lead to negative reactions from the server) to reload the whole page (about 40k) every time.
    How could I best determine these updated contents and send them to the program that further processes them?


    Please point me to the best language (Perl?) or tool for this task.

    I am familiar with several programming languages but unfortunately have very little knowledge of internet programming.
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,993
    Rep Power
    9397
    Are you sure it's XML? Not JavaScript? Because XML doesn't do stuff: it merely contains data.
    There may be JavaScript which retrieves XML, and updates the page according to that, but it probably isn't XML that's doing the update directly.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2011
    Posts
    2
    Rep Power
    0
    Sure, I have been unprecise in my desription.

    It is Javascript code that reloads XML content (which you obviously understood already).


    Any suggestions how the extraction of the XML content could be performed?
  6. #4
  7. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,993
    Rep Power
    9397
    Moved from XML.


    Depending on how you got the XML (AJAX, I assume) you can inspect it just like you would an HTML document.
    What does the XML look like and what are you trying to get?

IMN logo majestic logo threadwatch logo seochat tools logo