#1
  1. No Profile Picture
    Python/RDF Freak
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Posts
    14
    Rep Power
    0

    Generate possible sitemaps


    Hello,

    I'm trying to make a program that scans a site and take all the links. With these links the program have to generate all possible sitemaps.
    To generate all sitemaps is the problem. If there are 4 pages on a site there are 100 possible sitemaps.
    I'm still thinking how I can do this. Have someone of you any tips or solutions.

    I hope I'm clear enough.
    grtz from the Netherlands,

    Johie
  2. #2
  3. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,643
    Rep Power
    4248
    How do you generate 100 sitemaps out of 4 links? Assuming we have 4 links linkA, linkB, linkC, linkD, how is the sitemap supposed to look like?
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  4. #3
  5. No Profile Picture
    Python/RDF Freak
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Posts
    14
    Rep Power
    0
    If I have the 4 links, you can have several possible tree stuctures.

    Root ----LinkA-LinkB-LinkC-LinkD
    Root ----LinkA-LinkC-LinkB-LinkD

    Or

    Root----LinkA-LinkB
    |--LinkC-LinkD

    Root----LinkD-LinkB
    |--LinkC-LinkA

    Or

    Root -----LinkA-LinkB--LinkC
    |-LinkD

    If you count all possible combinations you get 100 results.
    I hope I have answered your question. Maybe now you know what I mean.
    grtz
    Johie
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Sorry i'm lost, how do you get LinkA, LinkB, LinkC and LinkD to yield 100 possible combinations, it seems like allot to me .. but i might b mistaken. Just out of interest why do you need to generate EVER possible sitemap combo?

    As for getting the links in the first place you might want to take a look at urlopen() in the urllib module which will let you read a webpage like any other file. You'll then have to get the links from this, you can do that pretty easily this with Pythons re (regular expressions) module.

    Have fun,
    Mark.
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    Python/RDF Freak
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Posts
    14
    Rep Power
    0
    Hi,

    After a few trials I saw it is impossible to do this. It takes a long time if there are a lot more links.

    But I have another question.
    I'd like to parse a website (that's not the problem) but it doesn't work when there are frames on it.

    Does anyone know how to parse the frames for links.

    I hope this is clear

    grtz
    johie
  10. #6
  11. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Probably the best way to parse a frameset would be to get the page referances from the main stage (manually or as part of the program) and then read and parse all the pages connected to the frame.. not too hard

    Mark.
    programming language development: www.netytan.com Hula

  12. #7
  13. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Location
    Tucson AZ
    Posts
    29
    Rep Power
    0

    sitemap


    With only 4 links you are limited to 24 possible sitemaps.
    at 5 links you would have 120, but the"root" is always in the first position, as you described it, so it is not effected by this.

    Unless you left out information such as each page links to every other page... then you would have 108 possibilities.

    This wouldn't be too hard as a loop....
    but with additional links this kind of program would get slower.

    Perhaps if we knew how/why you would need every possible sitemap and more about the site structure... something simpler can be suggested.

IMN logo majestic logo threadwatch logo seochat tools logo