MySQL Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsDatabasesMySQL Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old March 11th, 2013, 05:03 PM
ETbo ETbo is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2013
Posts: 3 ETbo User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 25 m 6 sec
Reputation Power: 0
Arrow De-duplicate 'smart' random row query?

Hi guys,

following some research, I found a query suitable for my needs: it randomly returns IDs from the table. The ID field is an auto-increment, so there are no holes.

Code:
SELECT `mydb`.`myTable`.id
FROM   (SELECT Floor (Rand() * (SELECT Count(*) 
								FROM   `mydb`.`myTable`)) num, 
			   @num := @num + 1 
		FROM   (SELECT @num := 0) a, 
			   `mydb`.`myTable` 
		LIMIT  2000000) b, 
	   `mydb`.`myTable` 
WHERE  b.num = `mydb`.`myTable`.id


The issue I experience is that the target table (myTable) contains 30-400M records, depending on the situation. In the LIMIT, I want to retrieve 2M randomly selected IDs, however I get a lot of duplicates (which is expected).

Is it possible to de-duplicate the query and yet receive 2M records? I thought to create a table and let it manage the UNIQUE values, but again I will get less than expected.

Any thoughts? Many thanks!

Reply With Quote
  #2  
Old March 13th, 2013, 05:33 AM
sr sr is offline
Problem Solver
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: Jan 2001
Location: Stockholm, Sweden
Posts: 4,430 sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 3 Weeks 4 Days 7 h 26 m 18 sec
Reputation Power: 532
If you want to do it this fast way by first selecting id's and then run a distinct on them the only suggestion I can give is to use two LIMIT.
One for the inner SELECT with a slightly higher LIMIT to compensate for the "loss" of duplicates (depending on how many duplicates you usually have in the subset, which is entirely dependent on your data.

And then the outer LIMIT part of the SELECT DISCTINCT chooses the exact amount of rows you want returned.
__________________
/Stefan

Reply With Quote
Reply

Viewing: Dev Shed ForumsDatabasesMySQL Help > De-duplicate 'smart' random row query?

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap