#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    California
    Posts
    6
    Rep Power
    0

    Challenging algorithm problem.


    Hi all,
    I'll try to describe the problem with my poor english
    I have N sets of consecutive values, for example:

    Set 1 = { 1,2,3,4,5 }
    Set 2 = { 4,5,6 }
    Set 3 = { 3,4,5,6,7,8,9 }
    Set 4 = { 0,1,2,3 }

    I need to construct all the possible sets of N values, each of which is taken from each of the N sets, for example:

    { 1,4,3,0 } { 1,4,3,1 } { 1,4,3,2 } { 1,4,3,3 }
    { 1,4,4,0 } { 1,4,4,1 } { 1,4,4,2 ] { 1,4,4,3 }

    and so on...

    There would be 5x3x7x4=420 possible sets, but:
    two sets are equivalent if they contain the same values (as definition of SET), for example:

    { 3,4,4,0 } is equivalent to { 4,4,3,0 } and one of these can be "discarded"

    What's the best algorithm to build all the possible sets (without duplicate sets)?

    The average case will be N=20 and set size of 10.

    Challenging huh?

    Antonio
    Last edited by banton; April 15th, 2003 at 01:59 PM.
  2. #2
  3. not a fan of fascism (n00b)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Feb 2003
    Location
    ct
    Posts
    2,756
    Rep Power
    95
    im thinking here.... and maybe a method to use would be to build a set, then compare it to the sets built, if there is an identical one, discard it. surely there are other ways that may be more efficient tho... are you concerened with performance here?
    edit: i was thinking of using an array of pointers to arrays, or perhaps a vector of pointers to arrays.
  4. #3
  5. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    California
    Posts
    6
    Rep Power
    0
    Yes, I'm concerned about performance.
    If you have 20 sets of 10 values you need to have a loop of 7,766,279,631,452,241,920 (10^20) iterations when the actual final set number will be much lower.

    A small improvement can be using steps like this:
    1. I get the first set and build a set for each element
    2. To each one-value set I add the second set values
    3. Discard the duplicated two-values sets
    4. Add the third set values
    5. Discard the 3-values duplicate sets
    and so on...

    For example:
    {1,2,3}
    {2,3}
    {4,5}

    1. {1} {2} {3}
    2. {12} {13} {22} {23} {32} {33}
    3. {12} {13} {22} {23} {33}
    4. {124} {125} {134} {135} {224} {225} {234} {235} {334} {335}
    5. nothing to do in this case

    There must be a better algorithm to solve this problem in less then O(x^N) :-\

    Antonio.

    ( the symbol ^ means power )
  6. #4
  7. not a fan of fascism (n00b)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Feb 2003
    Location
    ct
    Posts
    2,756
    Rep Power
    95
    heh... it seems as tho ur prolly better equipped to solve this then myself, but i always enjoy a challenge so i will try to help.

    edit: first let me get a few things straight.
    1)is the minimun set length 1?
    2)is there a maximum length?
    3)can u use only one element from each set when creating a new one?
    Last edited by infamous41md; April 10th, 2003 at 09:47 PM.
  8. #5
  9. Muhhnnn !!
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2003
    Posts
    1,530
    Rep Power
    89
    You can gain a bit on performances, taking on memory space.

    1- You are generating your sets one by one (wich is not especially complex)
    2- associated to your set, you generate a second set with elements sorted: {1,2,5,3} -> {1,2,3,5} (Can even be 1235 which is even better but implies that your elements are 0<x<9)
    3- You insert those seconds set in a sorted list (trishell would fit)
    4- You generate the next set and his associated second set (key) and search for the key in the sorted key list.

    In this way, you will decrease your Average complexity while keeping quite the same Max complexity...

    Don't really have time to calculate the complexity atm...
  10. #6
  11. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    UK
    Posts
    13
    Rep Power
    0
    A better algorithm would be to check your initial sets eg.
    {1,2,3}
    {2,3} <---2 repeats
    {1,5} <---1 repeat
    for repeated numbers and if the number of repeats in any set is >=2 to cut the stem accordingly.

    Looking for repeated pairs (or more) in your method would take a really long time with large sets.

    This way you you can ignore {3,2} at step 3, and you know that the total number of sets will be the product of the number of elements in each set minus the sum of (the product of the number of elements in each set after a repeated pair) = 3x2x2 - 2 = 10 here.

    What do you need it for?
  12. #7
  13. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    California
    Posts
    6
    Rep Power
    0
    Yes, hiFidelity.
    I understand your logic but how do you generate the result sets once you know that there will be duplicate sets?
    Do you already have an algorithm in mind? If so, please explain it to me.

    I agree with you, generating all the possible set and then looking for duplicate it's not the best idea but only the first algorithm that pops up in mind.

    What do I need it for? It's an application that simualates a complex behaviour from a set of input parameter. Each parameter can have a value within a range.

    Antonio.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    Seattle, WA
    Posts
    14
    Rep Power
    0
    Arrange the sets in ascending order based on the first number in each set. Now store each set of 2+ duplicates for each set. For example, given the following 3 initial sets:

    A {1,2,3}
    B {2,3,4,5}
    C {3,4}

    You would store these duplicates in separate sets:

    dA {} - always leave first set empty
    dB {2,3}
    dC {3,4}

    Note how each set only cares about duplicates in the sets prior to it. For example, set B doesn't care about the contents of set C or any set that migh come after it.

    Now we start building the new sets. As we build them, we check the stored duplicates to see if they're needed.

    From A: {1} - don't know if it's a duplicate yet
    From B: {1,2} - RED FLAG - 2 is in the duplicate set for B

    If any elements prior to this one are also in the duplicate set and are greater than this element, we can skip this set, as it is/will be a duplicate.* This is not the case (1 is not in the duplicate set for B), so we continue.

    From C: {1,2,3} - ANOTHER RED FLAG - 3 is in the duplicate set for C

    We check to see if any prior elements (1 or 2) are also in the duplicate set for 3, which they aren't. This set is good.

    Continuing on, we come up with the following sets:

    {1,2,3} - from above
    {1,2,4}
    {1,3,3}
    {1,3,4}

    Let's take a look at the next set more closely.

    From A {1} - we never know with the first element
    From B {4} - RED FLAG - but everything checks out
    From C {3} - RED FLAG - 3 is in the duplicate set for C

    Upon checking for prior elements that are also in the duplicate set for C, we find that the second element (4) is in the set as well, and is also greater than the third element. This is a duplicate set and can be skipped.

    Let's continue.

    {1,4,4}
    {1,5,3}
    {1,5,4}
    {2,2,3}
    {2,2,4}
    {2,3,3}
    {2,3,4}
    {2,4,3} - Duplicate - 3 and 4 are both in duplicate set for C and 4 is greater than 3 - ignored
    {2,4,4}
    {2,5,3}
    {2,5,4}
    {3,2,3} - Duplicate - ignored
    {3,2,4} - Duplicate - ignored
    {3,3,3}
    {3,3,4}
    {3,4,3} - Duplicate - ignored
    {3,4,4}
    {3,5,3}
    {3,5,4}

    That's it. We're left with only the unique sets. I don't know if this makes sense or if it even helps you at all. You'll of course want to check my logic, as I've not tested this on any significant data. It seems like it should work, though.

    * You could also ignore sets where the prior elements are less than the current element. The goal is, given the two sets {1, 2} and {2, 1}, one of them needs to be deleted. Which one is arbitrary.

    -Nag
  16. #9
  17. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    California
    Posts
    6
    Rep Power
    0

    Thumbs up


    Onag, this is a great algorithm.
    I implemented it and it seems working perfectly.
    Attached there is the java code if anyone wants to try it out.

    Thank you.
    Antonio.
    Attached Files
  18. #10
  19. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    California
    Posts
    6
    Rep Power
    0

    Thumbs up


    The following algorithm is 200 times faster, if anyone is interested. For instance, for 8 set of 10 values, with an overlapping rate of 75%, it generates the same output (about 3.3 million results) in 300 milliseconds, while the algorithm from Nag it takes about 50 seconds.

    public class TestComb {
    int len; int[] vals;
    // the current set
    int[][] data;
    TestComb(int[][] data)
    {
    this.data = data;
    len = data.length;
    vals = new int[len];
    for (int x = 0; x < len; x++)
    { // initialize first set
    vals[x] = data[x][0];
    }
    int x;
    long count = 0; int millions = 0;
    long time = System.currentTimeMillis();
    do {
    count++;
    if (count == 1000000)
    {
    count = 0; millions++; System.out.print((System.currentTimeMillis()-time)/millions + " ");
    }

    //print(vals);
    x = len;
    // current index to update
    while (x > 0)

    { // stop when no value can be updated
    x--;
    if (++vals[x] <= data[x][1])
    { // is val[x]+1 still valid
    while (++x < len)
    { // set all other indexes to min allowed values
    setMin(x);
    }
    break;
    }
    }
    }
    while (x > 0);
    System.out.println((count + millions*1000000) + " valid sets");
    }

    void setMin(int idx)
    {
    vals[idx] = data[idx][0];
    for (int x = 0; x < idx; x++)
    {
    if (vals[x] <= data[idx][1])
    {
    vals[idx] = Math.max(vals[x], vals[idx]);
    }
    }
    }

    void print(int[] val)
    {
    for (int x = 0; x < val.length; x++)
    {
    if (x == 0)
    { System.out.print("{"); }
    else { System.out.print(','); }
    System.out.print(val[x]); } System.out.println("}");
    }

    public static void main(String[] args)
    { int[][] data = {{0,0}, {0,1}, {0,4}, {1,5}, {1,4}}; // since using consecutive values, only need first and last values
    new TestComb(data); // data must be pre sorted by first value
    }
    }

    Antonio
  20. #11
  21. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2003
    Location
    California
    Posts
    6
    Rep Power
    0
    I am trying to calculate the number of valid results that input sets will generate before actually starting processing, and that's the next 'challange'. I know this is going to be strictly a math problem and I guess I am not good in math !! :-)
    For those of you who are, if anyone has an idea how to find that out please let me know.

    Why do I need that?
    In the user interface it will be nice to let the user know how long it will take to process his/her request before actually starting processing it. Also a progress bar will display the total and current number of results already processed.

    Antonio.
  22. #12
  23. No Profile Picture
    md_doc
    Guest
    Devshed Newbie (0 - 499 posts)
    ugh never mind just reread what you wrote and my post makes no sense now

IMN logo majestic logo threadwatch logo seochat tools logo