October 7th, 2003, 07:38 AM
Generate possible sitemaps
I'm trying to make a program that scans a site and take all the links. With these links the program have to generate all possible sitemaps.
To generate all sitemaps is the problem. If there are 4 pages on a site there are 100 possible sitemaps.
I'm still thinking how I can do this. Have someone of you any tips or solutions.
I hope I'm clear enough.
grtz from the Netherlands,
October 7th, 2003, 04:27 PM
How do you generate 100 sitemaps out of 4 links? Assuming we have 4 links linkA, linkB, linkC, linkD, how is the sitemap supposed to look like?
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne
"I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
October 7th, 2003, 04:42 PM
If I have the 4 links, you can have several possible tree stuctures.
If you count all possible combinations you get 100 results.
I hope I have answered your question. Maybe now you know what I mean.
October 7th, 2003, 08:26 PM
Sorry i'm lost, how do you get LinkA, LinkB, LinkC and LinkD to yield 100 possible combinations, it seems like allot to me .. but i might b mistaken. Just out of interest why do you need to generate EVER possible sitemap combo?
As for getting the links in the first place you might want to take a look at urlopen() in the urllib module which will let you read a webpage like any other file. You'll then have to get the links from this, you can do that pretty easily this with Pythons re (regular expressions) module.
October 14th, 2003, 08:41 AM
After a few trials I saw it is impossible to do this. It takes a long time if there are a lot more links.
But I have another question.
I'd like to parse a website (that's not the problem) but it doesn't work when there are frames on it.
Does anyone know how to parse the frames for links.
I hope this is clear
October 14th, 2003, 08:49 AM
Probably the best way to parse a frameset would be to get the page referances from the main stage (manually or as part of the program) and then read and parse all the pages connected to the frame.. not too hard
October 14th, 2003, 09:33 AM
With only 4 links you are limited to 24 possible sitemaps.
at 5 links you would have 120, but the"root" is always in the first position, as you described it, so it is not effected by this.
Unless you left out information such as each page links to every other page... then you would have 108 possibilities.
This wouldn't be too hard as a loop....
but with additional links this kind of program would get slower.
Perhaps if we knew how/why you would need every possible sitemap and more about the site structure... something simpler can be suggested.