#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2010
    Posts
    6
    Rep Power
    0

    Help a master thesis student: Help converting a 70+ line Ruby script to PHP


    Hi everyone,

    I知 currently working on my Master Thesis which in few words concentrates about NoSQL databases and performance and tuning tests of particular MongoDB.

    I知 trying to create some test dummy data to put in so I can run some initially tests to create a baseline for comparison. I致e found a MovieSet which should contain 10 million records and a parser to get the data from the movie files into MongoDB.

    My only problem is that the script is in Ruby which I致e absolutely no knowledge about what so ever. Because I知 working under a deadline I rather not use too much time studying the Ruby programming language. I知 though fairly familiar to PHP, so I was wonder if there was any Ruby & PHP expert that very graciously would convert the 70+ line Ruby script to PHP?

    Thank you

    Sincere
    - Mestika

    The Ruby script, in its orginal form (From Wrox.com Code Library):


    Code:
    require 'rubygems' #can skip this line in Ruby 1.9
    require 'mongo'
    
    field_map = {
        "users" => %w(_id gender age occupation zip_code),
        "movies" => %w(_id title genres),
        "ratings" => %w(user_id movie_id rating timestamp)
    }
    
    db = Mongo::Connection.new.db("mydb")
    collection_map = {
        "users" => db.collection("users"),
        "movies" => db.collection("movies"),
        "ratings" => db.collection("ratings")
    }
    
    unless ARGV.length == 1
        puts "Usage: movielens_dataloader data_filename"
        exit(0)
    end
    
    class Array
      def to_h(key_definition)
        result_hash = Hash.new()
        
        counter = 0
        key_definition.each do |definition|
          if not self[counter] == nil then
              if self[counter].is_a? Array or self[counter].is_a? Integer then
                  result_hash[definition] = self[counter]
              else
                  result_hash[definition] = self[counter].strip
              end
          else
            # Insert the key definition with a empty value.
            # Because we probably still want the hash to contain the key.
            result_hash[definition] = ""
          end
          # For some reason counter.next didn't work here....
          counter = counter + 1
        end
        
        return result_hash
      end
    end
    
    if File.exists?(ARGV[0])
        file = File.open(ARGV[0], 'r')
        data_set = ARGV[0].chomp.split(".")[0]
        file.each { |line|
            field_names = field_map[data_set] 
            field_values = line.split("::").map { |item|
                if item.to_i.to_s == item
                    item = item.to_i
                else
                    item
                end
            }
            puts "field_values: #{field_values}"
            #last_field_value = line.split("::").last
            last_field_value = field_values.last
            puts "last_field_value: #{last_field_value}"
            if last_field_value.split("|").length > 1
               field_values.pop 
               field_values.push(last_field_value.split().join('\n').split("|"))
            end
            field_values_doc = field_values.to_h(field_names)
            collection_map[data_set].insert(field_values_doc)
        }
        puts "inserted #{collection_map[data_set].count()} records into the #{collection_map[data_set].to_s} collection"
    end
  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,112
    Rep Power
    9398
    It's 4:30am so I apologize in advance if I make a mistake or three.

    First, you need the Mongo extension for PHP. It's not built-in. You can find the source in the PECL repository, but for precompiled versions you'll have to look elsewhere that I don't know.

    PHP Code:
    <?php

    // added
    function castintegers($a) {
        if (
    strval(intval($a)) == $a) {
            return 
    intval($a);
        } else {
            return 
    $a;
        }
    }
    // /added

    $field_map = array(
        
    "users" => array("_id""gender""age""occupation""zip_code"),
        
    "movies" => array("_id""title""genres"),
        
    "ratings" => array("user_id""movie_id""rating""timestamp")
    );

    $mongo = new Mongo(); // added
    $db $mongo->selectDB("mydb");
    $collection_map = array(
        
    "users" => new MongoCollection($db"users"),
        
    "movies" => new MongoCollection($db"movies"),
        
    "ratings" => new MongoCollection($db"ratings")
    );

    if (
    $argc == 1) {
        echo 
    "Usage: movielens_dataloader data_filename\n";
        echo 
    "Or something like that";
        exit(
    0);
    }

    if (
    file_exists($argv[1])) {
        
    $data_set strtok($argv[1], "."); // different means, same end
        
    foreach (file($file) as $line) {
            
    $field_names $field_map[$data_set];
            
    $field_values array_map("castintegers"explode("::"$line));
            echo 
    "field_values: "implode(","$field_values), "\n"// ???
            
    $last_field_value end($field_values);
            echo 
    "last_field_value: {$last_field_value}\n";
            if (
    count(explode("|"$last_field_value)) > 1) {
                
    array_pop($field_values);
                
    array_push($field_valuesexplode("|"implode("\n"explode(" "$last_field_value)))); // if Array.push(Array) doesn't merge them
            
    }
            
    $field_values_doc array_combine($field_names$field_values);
            
    $collection_map[$data_set]->insert($field_values_doc);
        }
        echo 
    "inserted {$collection_map[$data_set]->count()} records into the {$collection_map[$data_set]} collection\n";
    }

IMN logo majestic logo threadwatch logo seochat tools logo