October 4th, 2008, 09:10 PM
-
Need to get video file URLs + linked images
I'm very new to regex and I spent a few hours reading regex tutorials and examples, but I'm afraid I'm going to need some help since I need to get this script working asap.
I'm trying to make a video search engine which parses the content of a $url and extracts the video URLs and the image URLs the videos are linked to. Possible video formats are (wmv|avi|mov|mpg|mpeg)
Example remote URL containing videos/images urls:
Code:
<html>
<body>
<div>Header</div>
<div>Unimportant links<br>
<a href="http://www.unimportantsite1.com">unimportant site 1</a><br>
<a href="http://www.unimportantsite2.com">unimportant site 2</a><br>
</div>
Video 1:
<a href="video1.wmv"><img src="video1.jpg" width="320" height="240"></a>
<br>
Video 2:
<a href="video2.wmv"><img src="video2.jpg" width="320" height="240"></a>
Video 3:
<a href="video3.wmv"><img src="video3.jpg" width="320" height="240"></a>
</body>
</html>
This is as far as I can get for the PHP file:
PHP Code:
<?php
$contents = file_get_contents($url);
//GET VIDEOS
//get filename/urls of video files (could be full url or path to file)
//attributes could be in single quotes href='video2.wmv'
//double href="video2.wmv" or no quotes href=video2.wmv
preg_match_all("/[(\"|\')\z](.*)(wmv|avi|mov|mpg|mpeg)/imU",$contents,$matches);
//GET IMAGES
//get filename/urls of video files (could be full url or path to file)
//between anchor tags: src="" of <img> tag
//attributes could be in single quotes src='video2.jpg'
//double src="video2.jpg" or no quotes src=video2.jpg
//image types/extensions are important here since I just need what's
// in the img src attribute (height,width,border don't matter either)
//echo the video and image urls -
//in example it would be: video1.wmv + video1.jpg,
//video2.wmv + video2.jpg, video3.wmv + video3.jpg
?>
October 7th, 2008, 03:51 AM
-
Try this one:
<(a|img).*?(href|src)="(.*?\.(wmv|avi|mov|mpg|mpeg|jpg|jpeg))".*?>
It would give links in 4th group.