|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
CFHTTP looping problems
I have parsed an index page for specific job links, I have then stored those links in a 1dm array.
Using cfhttp I have then looped over the links in the array so that I can perform more detailed parsing once I reach these pages that the links point to. However when I have outputed the results of the cfhttp, I have somehow managed to return the index page several times as apposed to the actual pages that the links point to. Am I using cfhttp correctly here? ![]() |
|
#2
|
|||
|
|||
|
So is this where the problem is happening?
<!---Store the list of FoundLinks into the Links Array---> <cfset LinksArray = ListToArray(StripLinks)> <cfloop from="1" to="#arrayLen(LinksArray)#" index="i"> <cfhttp method="get" url="#LinksArray[i]#" resolveurl="yes"> <cfoutput>#cfhttp.FileContent#</cfoutput> </cfloop> Can you confirm that the listToArray() call is generating an array with the elements you are expecting?
__________________
Ask if you have a question, but also help answer questions that you have knowledge of! Thanks, Brian. How to Post a Question in the Forums |
|
#3
|
|||
|
|||
|
Yes that is where the problem is happening. I have done a <cfdump var="#LinksArray#"> and each URL appears in the array as an element.
|
|
#4
|
|||
|
|||
|
If you output each element of the array as you loop over it (before the cfhttp call), are you seeing the correct URLs? And if so, are you saying that the immediately following cfhttp call's output is not for that URL?
|
|
#5
|
|||
|
|||
|
I output each element of the array as I looped over it and the results were strange. From website 1 only 1/27 links were output.
From website 2 all 7/7 links were output. From website 3 1/1 links were output. Therefore Iam not seeing all the correct URL's. |
|
#6
|
|||
|
|||
|
Then the problem must be in the regular expression somewhere in here:
<cfset StartPos = 1> <cfloop condition ="True"> <!---Parse the site index pages for job links---> <cfset Match = REFindNoCase(#Trim(xmlObj.xmlRoot.site[i].parse.xmlAttributes.re)#, cfhttp.FileContent, StartPos, True)> <cfif Match.pos[1] EQ 0> <cfbreak> <cfelse> <cfset StartPos = Match.pos[1] + Match.len[1]> <!---<cfset Foundlinks = Mid(cfhttp.FileContent, Match.pos[1], Match.len[1])>---> <cfset StripLinks = #REReplaceNoCase(#Mid(cfhttp.FileContent, Match.pos[1], Match.len[1])#,"\s*HREF=\W", "", "ALL")#> Unfortunately I don't know much about regex (don't need to use them much). If this is indeed where the problem is you might ask in a regex forum or search around on the net for regex expressions that do what you need. |
|
#7
|
|||
|
|||
|
Your probably right Kiteless, Ill have a look round the web. Thanks for your time though
|
|
#8
|
|||
|
|||
|
Kiteless, just a general question; do you think there is a more efficient way of holding all the URL's for three different sites? As at present they are all stored in a 1dm array.
|
|
#9
|
|||
|
|||
|
I'd store them in a 2 dimensional array, where the first dimension is the site (so right now you'd have 3 elements in the first dimension) and the second element are the links within each site (so each of the 3 sites could have 1 or more links within it).
|
|
#10
|
|||
|
|||
|
Kiteless, sorry to bother you but I need you to tell me where Im going wrong.
Ive rewritten some of my code and this is my problem: I fixed the regex and it works fine. I get the links from the 1st webite fine, store them in an array, when I output the links they appear as the correct URLs(fine so far). I then loop over the array and cfhttp each URL from the array. When I output the contents of cfhttp.filecontents there is only one webpage displayed, which is the first URL in the array (there should be 16). Are there limits on how many pages can be output? With cfhttp.filecontents I have then tried to use 1/4 detailed page parsers (regex), which extracts specific info I need from each URL. The code Im using to do this is below, will I have to keep regenerating this code for each detailed page parser? I hope im making sense |
|
#11
|
|||
|
|||
|
No, I'm not understanding what you're trying to do.
|
|
#12
|
|||
|
|||
|
What Im trying to say is that, Im only outputting the first URL from the array when I perform a cfhttp on the array elements, there should be several. My loop looks fine but there is obviously something wrong.
|
|
#13
|
|||
|
|||
|
The first thing I would do is make sure that the URLs in the array are what you expect. If you dump the array do you see URL values that look correct?
Then make sure that the inner CFHTTP call is working. No, there is no limit on the number of pages you can output. Instead of all that regex stuff can you just output each inner CFHTTP call and see if it is doing what you expect? <cfset LinksArray = ListToArray(StripLinks)> <cfloop from="1" to="#arrayLen(LinksArray)#" index="i"> <cfhttp method="get" url="#LinksArray[i]#"> <cfdump var="#cfhttp#"> </cfloop> |
|
#14
|
|||
|
|||
|
The URL's are what I expected, they are all complete and they can be cfhttp'd manually. I can see the URL's in the array when i dump it and they are correct.
The inner cfhttp only calls the first URL in the array and then stops. I have tried it without the regex stuff and its still the same. It looks as though there is a problem with the loop????? I havent had any error messages either?? |
|
#15
|
|||
|
|||
|
So this is where it is failing?
<cfset StripLinks = #REReplaceNoCase(#Mid(cfhttp.FileContent, Match.pos[1], Match.len[1])#,"=[""]", "http://www.rspb.org.uk/vacancies/index.asp", "ALL")#> <cfset LinksArray = ListToArray(StripLinks)> <cfloop from="1" to="#arrayLen(LinksArray)#" index="i"> <cfhttp method="get" url="#LinksArray[i]#"> ... ? If so then the problem is that The LinksArray doesn't have the URLs that you need. More to the point I think this is not doing what you think it should: <cfset StripLinks = #REReplaceNoCase(#Mid(cfhttp.FileContent, Match.pos[1], Match.len[1])#,"=[""]", "http://www.rspb.org.uk/vacancies/index.asp", "ALL")#> REReplaceNoCase just replaces the first value with the second value in the string. I think if you do a cfdump var="#striplinks#" you'll see that it doesn't have the array of links that you are expecting. |
![]() |
| Viewing: Dev Shed Forums > Programming Languages - More > ColdFusion Development > CFHTTP looping problems |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|