Discuss Regex backreference problem in the JavaScript Development forum on Dev Shed. Regex backreference problem JavaScript Development forum discussing JavaScript and DHTML, AJAX, and issues such as coding cross-browser JavaScript.
Posts: 562
Time spent in forums: 1 Week 5 Days 3 h 4 m 2 sec
Reputation Power: 69
Regex backreference problem
Hi
I'm trying to use a regular expression to grab 'a' tags from html returned by xmlhttprequest and have it grabbing the tags correctly but I don't really want the entire tag, I just want the http path. I tried using a back reference and this gives me both the tag and the back referenced text.
Here's the expression:
(assume 'scr' contains valid html)
Posts: 665
Time spent in forums: 2 Weeks 31 m 38 sec
Reputation Power: 153
Is there a good reason for you to return the HTML as a string rather than XML? If you passed it as XML you'd be able to use getElementsByTagName("a") and collect the hrefs using the usual DOM calls. You are after all "[assuming] 'scr' contains valid html". Better the absence of an answer than a wrong answer, no?
Last edited by Joseph Taylor : April 5th, 2008 at 02:21 PM.
Posts: 562
Time spent in forums: 1 Week 5 Days 3 h 4 m 2 sec
Reputation Power: 69
Quote:
Originally Posted by Joseph Taylor
Is there a good reason for you to return the HTML as a string rather than XML? If you passed it as XML you'd be able to use getElementsByTagName("a") and collect the hrefs using the usual DOM calls. You are after all "[assuming] 'scr' contains valid html". Better the absence of an answer than a wrong answer, no?
Not sure if this is a 'good reason' but I'm processing the raw html so I don't need to create an invisible div to put stuff in.
I know the files I'm processing will be valid html because they are created and returned by the web server. Processing the html in string form targets the 'a' tag search to only this string. If I put this html in a dummy item, 'getElementsByTagName' would include every tag in the document not just the ones in the html block I'm processing. This would require adding a filter to make sure I'm only getting the paths from the dummy item.
The basic idea here is, I'm writing some experimental development tools to make it possible to design really cool Web 2.0 applications and this little part is in a Javascript resource mapper. Basically what I'm doing is using the document.URL to find the root directory, appending on the name of a special shared folder (Resources) and recursively scanning this folder from within javascript by using the following function at each level of recursion.
Code:
function DirectoryContentsAtPath(inPath)
{
var http = (window.XMLHttpRequest) ? new XMLHttpRequest() : new ActiveXObject('Microsoft.XMLHTTP') ;
http.open("get", inPath, false);
http.send(null);
if (http.status == 200)
{
// get the 'a' tags from the html
var result = http.responseText.match(/<a href=["']([0-9a-z._\/-]+)["']>/gi).toString();
// get the hrefs from the 'a' tags
result = result.match(/['"]([0-9-a-z_.\/-]+)["']/gi).toString();
// strip the quotes from the hrefs
result = result.match(/[0-9a-z_.\/-]+/gi);
if ((result.length) && (result.length > 0))
// Apache includes the directory as the first item
// so slice array to get a list of just the files
return result.slice(1);
}
return null;
}
It's still a little rough, but I have the resource mapper working on my home Apache web server. Surprisingly fast too.
I may be wrong here, but my gut instinct is that extracting the path info from a string is faster and more memory efficient than actually adding the http.responseText to a dummy element in the DOM searching the dummy element, filtering the search results and deleting it's contents.
Posts: 665
Time spent in forums: 2 Weeks 31 m 38 sec
Reputation Power: 153
Quote:
I may be wrong here, but my gut instinct is that extracting the path info from a string is faster and more memory efficient than actually adding the http.responseText to a dummy element in the DOM searching the dummy element, filtering the search results and deleting it's contents.
What do you think?
I think I'd still use the invisible div method--but given your situation, the regex should be fine. I wasn't aware that invisibleDiv.getElementsByTagName("a") would return all the links in the document... Nevertheless, it's entirely possible that this is a quirk of Internet Explorer. The fact that you don't have direct control of the HTML generation does rule out the responseXML idea.
Posts: 19,834
Time spent in forums: 6 Months 1 Day 21 h 3 m 17 sec
Reputation Power: 4192
It seems ktoz forgot that getElementsByTagName() is a method of Document and of Element, so you can use it on a <div> or any other element when needed.