|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
AT&T devCentral & BlackBerry(r) Webcast Series: BlackBerry and GPS -Build Location Awareness into your BlackBerry Applications, July 10th-1:00PM EST. Register Today!
|
|
#1
|
|||
|
|||
|
Generate possible sitemaps
Hello,
I'm trying to make a program that scans a site and take all the links. With these links the program have to generate all possible sitemaps. To generate all sitemaps is the problem. If there are 4 pages on a site there are 100 possible sitemaps. I'm still thinking how I can do this. Have someone of you any tips or solutions. I hope I'm clear enough. grtz from the Netherlands, Johie |
|
#2
|
||||
|
||||
|
How do you generate 100 sitemaps out of 4 links? Assuming we have 4 links linkA, linkB, linkC, linkD, how is the sitemap supposed to look like?
__________________
Up the Irons What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home. "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest Down with Sharon Osbourne Puzzle of the Month solved by sizeablegrin, etienne141 and L7Sqr, superior C/C++ programmers of the month |
|
#3
|
|||
|
|||
|
If I have the 4 links, you can have several possible tree stuctures.
Root ----LinkA-LinkB-LinkC-LinkD Root ----LinkA-LinkC-LinkB-LinkD Or Root----LinkA-LinkB |--LinkC-LinkD Root----LinkD-LinkB |--LinkC-LinkA Or Root -----LinkA-LinkB--LinkC |-LinkD If you count all possible combinations you get 100 results. I hope I have answered your question. Maybe now you know what I mean. grtz Johie |
|
#4
|
||||
|
||||
|
Sorry i'm lost, how do you get LinkA, LinkB, LinkC and LinkD to yield 100 possible combinations, it seems like allot to me
.. but i might b mistaken. Just out of interest why do you need to generate EVER possible sitemap combo?As for getting the links in the first place you might want to take a look at urlopen() in the urllib module which will let you read a webpage like any other file. You'll then have to get the links from this, you can do that pretty easily this with Pythons re (regular expressions) module. Have fun, Mark. |
|
#5
|
|||
|
|||
|
Hi,
After a few trials I saw it is impossible to do this. It takes a long time if there are a lot more links. But I have another question. I'd like to parse a website (that's not the problem) but it doesn't work when there are frames on it. Does anyone know how to parse the frames for links. I hope this is clear grtz johie |
|
#6
|
||||
|
||||
|
Probably the best way to parse a frameset would be to get the page referances from the main stage (manually or as part of the program) and then read and parse all the pages connected to the frame.. not too hard
![]() Mark. |
|
#7
|
|||
|
|||
|
sitemap
With only 4 links you are limited to 24 possible sitemaps.
at 5 links you would have 120, but the"root" is always in the first position, as you described it, so it is not effected by this. Unless you left out information such as each page links to every other page... then you would have 108 possibilities. This wouldn't be too hard as a loop.... but with additional links this kind of program would get slower. Perhaps if we knew how/why you would need every possible sitemap and more about the site structure... something simpler can be suggested. |
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Python Programming > Generate possible sitemaps |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|