#1
  1. (retired)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2003
    Location
    The Laboratory
    Posts
    10,101
    Rep Power
    0

    Generating nested trees from string data


    Hey folks,

    I'm at a bit of a loss here, I'm trying to work out the best way to build a nested tree from a list of strings. I'm trying to transfer a bunch of classification strings into a Newick format phylogenetic tree representation.

    So - these standardised classifications of languages:
    1) Altaic, Turkic, Northern
    2) Altaic, Turkic, Western, Aralo-Caspian
    3) Altaic, Turkic, Western, Uralian
    4) Andamanese, Great Andamanese, Central
    5) Austronesian, Formosan, Paiwanic
    6) Austronesian, Malayo-Polynesian, Central-Eastern, Eastern Malayo-Polynesian, Oceanic, Central-Eastern Oceanic, Remote Oceanic, Central Pacific, East Fijian-Polynesian, Polynesian, Nuclear, Samoic-Outlier, Samoan
    ...etc

    which represents a hierarchical tree something like this:
    Code:
                      ----------- 2
            ---------|
      -----|         ------------ 3
      |    -----------------------1
      |
       ----------------------------4
      |
      |               --------------6
      ---------------|
                      -------------5
    Which needs to be translated into this ("Newick format") shorthand:
    Code:
    ( ( 2, 3 ) 1 ) ( 4 ) ( 5 , 6 )
    I don't quite know the best way to do this, and would love to hear suggestions. What I'm thinking of at the moment, is exploding the class. strings into their tokens and then parsing backwards down the string?

    --Simon
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Posts
    461
    Rep Power
    24
    I think you could make an array of all the languages. THen use a dictionary to pull out differant classifications so something like:

    Code:
    >>> types = ["Altaic", "Turkic", "Northern" , "Western", "Aralo-Caspian"]
    >>> classes = { 1: [0, 1, 2], 2: [0, 1, 3, 4]}
    >>> for k, v in classes.iteritems():
    ...     holder = ""
    ...     for i in v:
    ...             holder = holder + types[i] + " "
    ...     print k, ": " + holder
    ...
    1 : Altaic Turkic Northern
    2 : Altaic Turkic Western Aralo-Caspian
    >>>
    hopfully this helps, Sorry if you needded something else. I just thought this would be a bit more simple.

    Comments on this post

    • SimonGreenhill agrees : Thanks Cyber.

IMN logo majestic logo threadwatch logo seochat tools logo