#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2009
    Posts
    59
    Rep Power
    6

    Record matching with String Grid components - Seeking better algorithm for this task


    I'm working on a project that is supposed to be generating a list of matched items off of data drawn from a database. Each record in the table I'm drawing from consists of two records from a previous table matched together on one or more of a group of certain criteria (via database join), and contains two key values for this stage of the project - ID1 and ID2.

    I'm trying to match all possible matching sets, not just sets of two. As an example, suppose I've got five records R1-R5 that match as follows:

    R1 and R2 match on criterion C1. (ID1:R1 and ID2:R2)
    R3 and R4 match on criterion C1.
    R2 and R3 match on criterion C2.
    R4 and R5 match on criterion C2.

    This gives me, from the database, eight records I have to look at - one for each of the matches above, and one for their mirror inverse versions (R1-R2<=>R2-R1). By the instructions I'm working under, all five of my original records should be grouped together, as each group is supposed to receive a unique group id value in the next step.

    I've developed code that seems to do the job, in a brute-force manner using a pair of string grids - the code is listed below. Grid 'Mimic' is directly loaded out with the database information, while grid 'Meta' is the one I'm trying to do my actual matching in.
    Code:
    var
      x:  integer;
      y:  integer;
      a:  integer;
      b:  integer;
      ln_edge:  integer;
      ln_courier: string;
    begin
      //Set Meta grid = Mimic grid.
      strngrdMeta.RowCount := strngrdMimic.RowCount;
      for x := 0 to strngrdMimic.RowCount-1 do
      begin
        for y := 0 to strngrdMimic.ColCount-1 do
        begin
          strngrdMeta.Cells[y,x] := strngrdMimic.Cells[y,x];
        end;
      end;
    
      //Add items to additional fields of mimic grid based on meta grid.
      for x := 1 to strngrdMimic.RowCount-1 do
      begin
        for a := 1 to strngrdMeta.RowCount-1 do
        begin
          for b := 0 to strngrdMeta.ColCount-1 do
          begin
            if ((strngrdMimic.Cells[0,x] = strngrdMeta.Cells[b,a]) and (strngrdMimic.Cells[1,x] <> '')) then
            begin
              strngrdMeta.Cells[b+1,a] := strngrdMimic.Cells[1,x];
            end;
          end;
        end;
      end;
    
      //Filter out instances where cells appear more than once in a row in meta grid.
      for a := 1 to strngrdMeta.RowCount-1 do
      begin
        for b := 0 to strngrdMeta.ColCount-1 do
        begin
          for y := 0 to b do
          begin
            if ((strngrdMeta.Cells[y,a] = strngrdMeta.Cells[b,a]) and (b <> y) and (strngrdMeta.Cells[b,a] <> '')) then
            begin
              strngrdMeta.Cells[b,a] := '';
            end;
          end;
        end;
      end;
    
      //Clean out entries in meta grid that appear in a previous row.
      for x := 1 to strngrdMeta.RowCount-1 do
      begin
        for y := 0 to strngrdMeta.ColCount-1 do
        begin
          ln_courier := strngrdMeta.Cells[y,x];
          for a := 1 to strngrdMeta.RowCount-1 do
          begin
            if ((strngrdMeta.Cells[0,a] = ln_courier) and (a <> x)) then
            begin
              for b := 0 to strngrdMeta.ColCount-1 do
              begin
                strngrdMeta.Cells[b,a] := '';
              end;
            end;
          end;
        end;
      end;
    
      //Remove empty rows at end of meta grid.
      while (strngrdMeta.Cells[0,strngrdMeta.RowCount-1] = '') do
      begin
        strngrdMeta.RowCount := strngrdMeta.RowCount-1;
      end;
    For my little five-record sample above, this works - but I'm not working with a simple five samples, I'm working with a dataset of some 30,000 records, and no particular way of knowing what matches what or how many connected matches a group might have in it. The only time I've attempted to run that code block above against the full data set, it took well over twenty hours to run. Given that my original data set can change on a daily basis, this isn't an acceptable run time.

    Does anyone have any suggestions for anything I could do to speed this process up so as to obtain a more reasonable runtime? I'm certain my current version is very poor and inefficient, but I'm also uncertain as to how to make it more efficient at this point.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Posts
    354
    Rep Power
    7
    It may help to provide a short version of what your original data looks like (as in the structure of the database tables the data originates from), and what it looks like after the database Join to produce your ID1 and ID2 key values.

    As the question stands now, all we know is that you have some information in two StringGrids, not even how many columns it has, and I'm not sure how you get criterian C1, C2, ..., Cn

    If your original data is sitting within a database, then the way to make it more efficient is to use the power of the database to reduce the number of records to just those you want to see and in the format you want to see it in. To do so, you need to use SQL which is a SET based approach as opposed to a ROW based approach you are attempting to solve your problem with.

    Comments on this post

    • clivew agrees

IMN logo majestic logo threadwatch logo seochat tools logo