April 9th, 2012, 06:38 AM
Data aggregation - need advice
I'm thinking of a way to save and extract useful info. from the data gathered from a search process, such as the popular search keywords, etc ...; so, in the simplest way the elements are: (search keyword,the number of occurrence, time, in the simplest way the last occurrence of the keyword); I know it can be done by using a relational dbms like mysql, but I was thinking may be it's better to use the NoSQL concept for that (because of the performance issues). what tool/structure do you recommend ? I was thinking of removing the time element and using persistent key-value tools such as redis, but I'm not sure about that
thanks in advance
P.S : I'm using java (spring as framework) for my application
April 10th, 2012, 03:16 AM
October 9th, 2013, 04:50 AM
Hi @procfs. It could make sense to use a NoSQL tool for the use case you mention. I know many users choose Redis for its simplicity so it could be a good fit. However you should take into account that Redis is fully in-memory, so depending on the size of your data set, it could require a lot of memory to run - I'm not sure what search keywords you are analyzing, but if it's a public data set like Google, it can get very big. You might find you don't have an adequate machine to hold all the data in memory, and I know clustering/sharding in Redis (to make it work on two or more machines) can be complex. If this is an issue, you should consider a solution such as hosted redis - see the link for a commercial solution by Garantia Data, there used to be a competing service called Redis2Go but they have shut down recently.
In general, many of the NoSQL solutions can be a bit difficult to install and maintain if you go beyond one machine, as a new user this could be an issue for you, so here are two more NoSQL databases which are available as a hosted service:
* Amazon DynamoDB - high performance, very easy to use, but less functionality than Redis which is based on a document model.
* Mongolabs - MongoDB as a service, document model like Redis, supports more complex queries. Considered a bit "heavier" than Redis.