#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2006
    Posts
    13
    Rep Power
    0

    Extracting information from Html page


    Hello guys

    Please can I ask for some tips on how I can work on this issue. I have stored an HTML page
    in a mysql database. I have able to view this page when I use htmlentities().What I need is to extract
    information from this page, for example there are product information on this page, like cost, size etc.

    The one idea was to have an array and then check if this information exists in the html page, but I am not sure
    for example if I find the product, how do find it's corresponding price. For example if I find product one,
    then how can I find the first price and allocate it to this product and the 2nd and 3rd etc.

    I hope I am making sense guys.

    Thanks in advance
    KImmy
  2. #2
  3. Maddening Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,458
    Rep Power
    9645
    Use DOM to travel through the HTML document to find the pieces you need. Like if you need to find a particular table cell in a particular table then you would use DOM methods to find the table, find the cell, and get the contents of the cell.

    Need to see the HTML for more specific advice.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2006
    Posts
    13
    Rep Power
    0
    Here is some sample html, what I am trying to extract is the Backache and Bladder infection. I first need to make sure it falls under Major Condition

    Code:
    <tr>
        <td align="center" style="font-family: Arial, Helvetica, sans-serif; font-size:24px "><strong>Major Conditions</strong></td></tr>
    	  <tr>
        <td align="center"><br></td></tr>
    	<tr>
    	
    	  <tr>
        <td align="left" style="font-family: Arial, Helvetica, sans-serif; font-size:14px ">The following dietary and lifestyle changes and the suggested nutritional supplements are known to be beneficial when taken for the major conditions indicated. </td></tr>
    	  <tr>
        <td align="center"><br></td></tr>
    	<tr>
    	
        <td><table width="99%" border="1" bordercolor="#FFFFFF" cellpadding="5" cellspacing="5">
      
      	<tr align="center" bgcolor="#007B4C"><td colspan="3"><strong><font color="#FFFFFF">Backache</font></strong></td>
    	</tr>
    	<tr bgcolor="#D6DF94"><td align="center" valign="middle" width="33%"><strong>Description</strong></td>
    	<td align="center" valign="middle" width="43%"><strong>Lifestyle Changes</strong></td>
        <td align="center" valign="middle"><strong>Recommended Products</strong></td>
    	</tr>
    	<tr bgcolor="#E8EFB2"><td align="left" valign="top">Backache or lumbago is back pain, most often in the lower back.</td>
    	<td align="left" valign="top">When pain strikes, immediately consume two large glasses of filtered water; muscular pain and backache is often associated with dehydration � try to drink at least 8 glasses of water a day. Avoid animal protein and meats until the pain subsides; these contain uric acid which may put undue stress on the kidneys, causing the back pain.	  </td>
        <td align="left" valign="top"><ul type='disc'>	
    	<li>Harpago</li>
    		
    	<li>Harpago Cream</li>
    		
    	<li>All Omega</li>
    		
    	<li>Mig Defence</li>
    		</ul></td>
    	</tr>	<tr align="center" bgcolor="#007B4C"><td colspan="3"><strong><font color="#FFFFFF">Bladder Infection</font></strong></td>
    	</tr>
    	<tr bgcolor="#D6DF94"><td align="center" valign="middle" width="33%"><strong>Description</strong></td>
    	<td align="center" valign="middle" width="43%"><strong>Lifestyle Changes</strong></td>
        <td align="center" valign="middle"><strong>Recommended Products</strong></td>
    	</tr>
    	<tr bgcolor="#E8EFB2"><td align="left" valign="top">Bladder infections are characterized by an urgent desire to empty the bladder.</td>
    	<td align="left" valign="top">Drinking plenty fresh cranberry juice (pure, unsweetened) will inhibit bacterial growth in the bladder and prevent bacteria from adhering to the lining of the bladder.  Consume at least 8 glasses of quality water a day.	  </td>
        <td align="left" valign="top"><ul type='disc'>	
    	<li>Mega B&amp;C</li>
    		
    	<li>Super Concentrated Garlic</li>
    		
    	<li>Aloe Concentrate</li>
    		
    	<li>Candidex</li>
    		</ul></td>
    	</tr></table></td></tr>
    		  <tr>
        <td align="center"><br></td></tr>
  6. #4
  7. Maddening Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,458
    Rep Power
    9645
    That markup is horrible.

    All I can tell is that you have a table inside a table, and the first cell of each row of that inner table contains the symptom. Except the last row which is empty. So basically you have to figure out what it takes to locate the correct outer table (you didn't post that markup), get its inner table, then I'm thinking get all the TDs inside and look at every third one (0, 3, 6, etc.) skipping the ones with empty content.

    Combined that's
    - a DOMDocument with the markup
    - some process you need to get the outer table
    - getElementsByTagName on the outer table to get the inner table
    - getElementsByTagName on inner table to get the table cells
    - a loop to look at every third cell
    Last edited by requinix; October 8th, 2017 at 06:30 PM. Reason: the one empty row I saw is outside the table
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2006
    Posts
    13
    Rep Power
    0
    Thank you, I am going to try something out now

    Thanks again

IMN logo majestic logo threadwatch logo seochat tools logo