Thread: Regex help !

  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2011
    Rep Power

    Regex help !

    Hey guys, i am writing a web crawler to pick out urls on a page: the page is and urls i am looking for are the download urls such as : i already have a regex which is :
     $expr = '!href="*?)/(\d+)-(.*?).html!i';
    but thats picking out to much, any help would be greatly appreciated.

    Thank you

    Last edited by requinix; September 14th, 2011 at 05:41 PM.
  2. #2
  3. Reversible Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Washington, USA
    Rep Power
    URLs partially censored. No linking to warez sites.

    Also, locked. If it was just a generic web crawler that's one thing, but you're specifically coding for one site - and even for a specific URL structure - and it's all very obviously for the purpose of indexing the illegal materials they reference.

    I'm fine with your 400 Bad Request thread in PHP because that can be a problem regardless of what sites your bot crawls. But anything having to do any part whatsoever with the afore-censored site or its subject matter is off-limits. Keep it very generic.

IMN logo majestic logo threadwatch logo seochat tools logo