April 10th, 2003, 05:09 PM

Challenging algorithm problem.
Hi all,
I'll try to describe the problem with my poor english
I have N sets of consecutive values, for example:
Set 1 = { 1,2,3,4,5 }
Set 2 = { 4,5,6 }
Set 3 = { 3,4,5,6,7,8,9 }
Set 4 = { 0,1,2,3 }
I need to construct all the possible sets of N values, each of which is taken from each of the N sets, for example:
{ 1,4,3,0 } { 1,4,3,1 } { 1,4,3,2 } { 1,4,3,3 }
{ 1,4,4,0 } { 1,4,4,1 } { 1,4,4,2 ] { 1,4,4,3 }
and so on...
There would be 5x3x7x4=420 possible sets, but:
two sets are equivalent if they contain the same values (as definition of SET), for example:
{ 3,4,4,0 } is equivalent to { 4,4,3,0 } and one of these can be "discarded"
What's the best algorithm to build all the possible sets (without duplicate sets)?
The average case will be N=20 and set size of 10.
Challenging huh?
Antonio
Last edited by banton; April 15th, 2003 at 12:59 PM.
April 10th, 2003, 05:40 PM

im thinking here.... and maybe a method to use would be to build a set, then compare it to the sets built, if there is an identical one, discard it. surely there are other ways that may be more efficient tho... are you concerened with performance here?
edit: i was thinking of using an array of pointers to arrays, or perhaps a vector of pointers to arrays.
April 10th, 2003, 07:25 PM

Yes, I'm concerned about performance.
If you have 20 sets of 10 values you need to have a loop of 7,766,279,631,452,241,920 (10^20) iterations when the actual final set number will be much lower.
A small improvement can be using steps like this:
1. I get the first set and build a set for each element
2. To each onevalue set I add the second set values
3. Discard the duplicated twovalues sets
4. Add the third set values
5. Discard the 3values duplicate sets
and so on...
For example:
{1,2,3}
{2,3}
{4,5}
1. {1} {2} {3}
2. {12} {13} {22} {23} {32} {33}
3. {12} {13} {22} {23} {33}
4. {124} {125} {134} {135} {224} {225} {234} {235} {334} {335}
5. nothing to do in this case
There must be a better algorithm to solve this problem in less then O(x^N) :\
Antonio.
( the symbol ^ means power )
April 10th, 2003, 08:43 PM

heh... it seems as tho ur prolly better equipped to solve this then myself, but i always enjoy a challenge so i will try to help.
edit: first let me get a few things straight.
1)is the minimun set length 1?
2)is there a maximum length?
3)can u use only one element from each set when creating a new one?
Last edited by infamous41md; April 10th, 2003 at 08:47 PM.
April 11th, 2003, 01:53 AM

You can gain a bit on performances, taking on memory space.
1 You are generating your sets one by one (wich is not especially complex)
2 associated to your set, you generate a second set with elements sorted: {1,2,5,3} > {1,2,3,5} (Can even be 1235 which is even better but implies that your elements are 0<x<9)
3 You insert those seconds set in a sorted list (trishell would fit)
4 You generate the next set and his associated second set (key) and search for the key in the sorted key list.
In this way, you will decrease your Average complexity while keeping quite the same Max complexity...
Don't really have time to calculate the complexity atm...
April 12th, 2003, 03:29 AM

A better algorithm would be to check your initial sets eg.
{1,2,3}
{2,3} <2 repeats
{1,5} <1 repeat
for repeated numbers and if the number of repeats in any set is >=2 to cut the stem accordingly.
Looking for repeated pairs (or more) in your method would take a really long time with large sets.
This way you you can ignore {3,2} at step 3, and you know that the total number of sets will be the product of the number of elements in each set minus the sum of (the product of the number of elements in each set after a repeated pair) = 3x2x2  2 = 10 here.
What do you need it for?
April 15th, 2003, 12:56 PM

Yes, hiFidelity.
I understand your logic but how do you generate the result sets once you know that there will be duplicate sets?
Do you already have an algorithm in mind? If so, please explain it to me.
I agree with you, generating all the possible set and then looking for duplicate it's not the best idea but only the first algorithm that pops up in mind.
What do I need it for? It's an application that simualates a complex behaviour from a set of input parameter. Each parameter can have a value within a range.
Antonio.
April 16th, 2003, 04:23 AM

Arrange the sets in ascending order based on the first number in each set. Now store each set of 2+ duplicates for each set. For example, given the following 3 initial sets:
A {1,2,3}
B {2,3,4,5}
C {3,4}
You would store these duplicates in separate sets:
dA {}  always leave first set empty
dB {2,3}
dC {3,4}
Note how each set only cares about duplicates in the sets prior to it. For example, set B doesn't care about the contents of set C or any set that migh come after it.
Now we start building the new sets. As we build them, we check the stored duplicates to see if they're needed.
From A: {1}  don't know if it's a duplicate yet
From B: {1,2}  RED FLAG  2 is in the duplicate set for B
If any elements prior to this one are also in the duplicate set and are greater than this element, we can skip this set, as it is/will be a duplicate.* This is not the case (1 is not in the duplicate set for B), so we continue.
From C: {1,2,3}  ANOTHER RED FLAG  3 is in the duplicate set for C
We check to see if any prior elements (1 or 2) are also in the duplicate set for 3, which they aren't. This set is good.
Continuing on, we come up with the following sets:
{1,2,3}  from above
{1,2,4}
{1,3,3}
{1,3,4}
Let's take a look at the next set more closely.
From A {1}  we never know with the first element
From B {4}  RED FLAG  but everything checks out
From C {3}  RED FLAG  3 is in the duplicate set for C
Upon checking for prior elements that are also in the duplicate set for C, we find that the second element (4) is in the set as well, and is also greater than the third element. This is a duplicate set and can be skipped.
Let's continue.
{1,4,4}
{1,5,3}
{1,5,4}
{2,2,3}
{2,2,4}
{2,3,3}
{2,3,4}
{2,4,3}  Duplicate  3 and 4 are both in duplicate set for C and 4 is greater than 3  ignored
{2,4,4}
{2,5,3}
{2,5,4}
{3,2,3}  Duplicate  ignored
{3,2,4}  Duplicate  ignored
{3,3,3}
{3,3,4}
{3,4,3}  Duplicate  ignored
{3,4,4}
{3,5,3}
{3,5,4}
That's it. We're left with only the unique sets. I don't know if this makes sense or if it even helps you at all. You'll of course want to check my logic, as I've not tested this on any significant data. It seems like it should work, though.
* You could also ignore sets where the prior elements are less than the current element. The goal is, given the two sets {1, 2} and {2, 1}, one of them needs to be deleted. Which one is arbitrary.
Nag
April 16th, 2003, 05:02 PM

Onag, this is a great algorithm.
I implemented it and it seems working perfectly.
Attached there is the java code if anyone wants to try it out.
Thank you.
Antonio.
April 22nd, 2003, 12:26 PM

The following algorithm is 200 times faster, if anyone is interested. For instance, for 8 set of 10 values, with an overlapping rate of 75%, it generates the same output (about 3.3 million results) in 300 milliseconds, while the algorithm from Nag it takes about 50 seconds.
public class TestComb {
int len; int[] vals;
// the current set
int[][] data;
TestComb(int[][] data)
{
this.data = data;
len = data.length;
vals = new int[len];
for (int x = 0; x < len; x++)
{ // initialize first set
vals[x] = data[x][0];
}
int x;
long count = 0; int millions = 0;
long time = System.currentTimeMillis();
do {
count++;
if (count == 1000000)
{
count = 0; millions++; System.out.print((System.currentTimeMillis()time)/millions + " ");
}
//print(vals);
x = len;
// current index to update
while (x > 0)
{ // stop when no value can be updated
x;
if (++vals[x] <= data[x][1])
{ // is val[x]+1 still valid
while (++x < len)
{ // set all other indexes to min allowed values
setMin(x);
}
break;
}
}
}
while (x > 0);
System.out.println((count + millions*1000000) + " valid sets");
}
void setMin(int idx)
{
vals[idx] = data[idx][0];
for (int x = 0; x < idx; x++)
{
if (vals[x] <= data[idx][1])
{
vals[idx] = Math.max(vals[x], vals[idx]);
}
}
}
void print(int[] val)
{
for (int x = 0; x < val.length; x++)
{
if (x == 0)
{ System.out.print("{"); }
else { System.out.print(','); }
System.out.print(val[x]); } System.out.println("}");
}
public static void main(String[] args)
{ int[][] data = {{0,0}, {0,1}, {0,4}, {1,5}, {1,4}}; // since using consecutive values, only need first and last values
new TestComb(data); // data must be pre sorted by first value
}
}
Antonio
April 23rd, 2003, 02:11 PM

I am trying to calculate the number of valid results that input sets will generate before actually starting processing, and that's the next 'challange'. I know this is going to be strictly a math problem and I guess I am not good in math !! :)
For those of you who are, if anyone has an idea how to find that out please let me know.
Why do I need that?
In the user interface it will be nice to let the user know how long it will take to process his/her request before actually starting processing it. Also a progress bar will display the total and current number of results already processed.
Antonio.

ugh never mind just reread what you wrote and my post makes no sense now