#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2008
    Posts
    73
    Rep Power
    6

    Php preg match regex for this html i am curling


    I need to build a regex to get everything inside the main_text div.

    Code:
    </div>
    </div>
    <div class="main_text">
    <p>There were no concrete decisions in Thursday's meetings in the ongoing discussion on how to cut costs for Metrorail's Phase II of rail to Dulles.</p>
    <p>However, the idea has been proposed to transfer some of the financial burden back to the local governments.</p>
    
    <p>The latest estimate for Phase II of the Silver Line, which will run from Reston to Dulles International Airport and into Loudoun County, is $3.5 billion. More than $300 million of those costs would be for the Metropolitan Washington Airports Authority's &nbsp;(MWAA) proposed underground station at the airport, which brings the and total cost estimate in about $1 billion more than expected.</p>
    <p>At this week's meeting between U.S. Transportation Secretary Ray LaHood, Loudoun and Fairfax County Supervisors and MWAA officials, LaHood asked stakeholders to discuss with their boards the idea of putting public-private partnerships in place to take some of the burden off of the Dulles Toll Road.</p>
    <p>Toll Road fees are predicted to be $10 or more in future years if some of the bill is not split</p>
    <p><em>The Washington Post </em>reported that an Federal Transit Administration official proposed transferring responsibility for the planned&nbsp;<a href="http://www.dullesmetro.com/stations/route28.cfm">Route 28 rail station</a>&nbsp;to Fairfax County to save an estimated $136&nbsp;million.</p>
    <p>Under the proposal, Fairfax and Loudoun counties would also take charge of building five commuter parking lots for a savings of $235&nbsp;million.</p>
    <p><a href="http://www.washingtonpost.com/local/dulles-rail-talks-in-flux-with-airport-station-location-on-the-table/2011/06/30/AGGhhisH_story.html">Read the entire Post story here.</a></p>
    
    <p>The parties will further discuss the new ideas later this month.</p>
    </div>
    <div class="legroom headroom">
    
    </div>
    I am doing a preg match on this with php with no success
    I was thinking since there are linebreaks there maybe a problem... but then i tried with \n and its still not getting anything.

    I have tried the following:

    Code:
    "@<div.*class=" . "'". "main_text" . "'" ".*?>\n(.*)\n</div>@"
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2007
    Posts
    1,513
    Rep Power
    1424
    The regular expression would be something along the lines of
    Code:
    #<div class="main_text">(.*?)</div>#s
    , at least as long as there are no nested divs and the text between start and end tag isn't too long.

    For php I'd rather recommend to use DOMDocument to parse HTML input data, though.

    Regards, Jens

IMN logo majestic logo threadwatch logo seochat tools logo