#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    2
    Rep Power
    0

    Data aggregation - need advice


    Hi all,
    I'm thinking of a way to save and extract useful info. from the data gathered from a search process, such as the popular search keywords, etc ...; so, in the simplest way the elements are: (search keyword,the number of occurrence, time, in the simplest way the last occurrence of the keyword); I know it can be done by using a relational dbms like mysql, but I was thinking may be it's better to use the NoSQL concept for that (because of the performance issues). what tool/structure do you recommend ? I was thinking of removing the time element and using persistent key-value tools such as redis, but I'm not sure about that
    thanks in advance
    P.S : I'm using java (spring as framework) for my application
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    2
    Rep Power
    0
    any opinions ?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Location
    Haifa, Israel
    Posts
    17
    Rep Power
    0
    Hi @procfs. It could make sense to use a NoSQL tool for the use case you mention. I know many users choose Redis for its simplicity so it could be a good fit. However you should take into account that Redis is fully in-memory, so depending on the size of your data set, it could require a lot of memory to run - I'm not sure what search keywords you are analyzing, but if it's a public data set like Google, it can get very big. You might find you don't have an adequate machine to hold all the data in memory, and I know clustering/sharding in Redis (to make it work on two or more machines) can be complex. If this is an issue, you should consider a solution such as hosted redis - see the link for a commercial solution by Garantia Data, there used to be a competing service called Redis2Go but they have shut down recently.

    In general, many of the NoSQL solutions can be a bit difficult to install and maintain if you go beyond one machine, as a new user this could be an issue for you, so here are two more NoSQL databases which are available as a hosted service:
    * Amazon DynamoDB - high performance, very easy to use, but less functionality than Redis which is based on a document model.
    * Mongolabs - MongoDB as a service, document model like Redis, supports more complex queries. Considered a bit "heavier" than Redis.

    HTH

IMN logo majestic logo threadwatch logo seochat tools logo