C Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesC Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 26th, 2003, 12:29 AM
infamous41md's Avatar
infamous41md infamous41md is offline
not a fan of fascism (n00b)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Feb 2003
Location: ct
Posts: 2,756 infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 2 Days 11 h 4 m 29 sec
Reputation Power: 26
Code Benchmarking Tests

What this is:
-a small C/asm source file to help benchmark a function that iterates over a data set.

What it does:
-computes the number of cycles required for the function to run
-computes the CPE(cycles per element) of your function

How it Works:
-it uses Pentium specific assembly instructions to read the processors timestamp counter, which is a 64 bit value that represents the number of cycles passed since the processor was reset

What you do:
-you write a function that takes and returns a void pointer for an argument. in the main function you pack your necessary args into a structure or w/e, and then unpack it in your function. you call the function i wrote, test_it() and you pass to it a pionter to your function and your packed up argument structure. the test_it() function will then run your function passing it your argument and benchmark the performance.

Does it work:
-yes actually well as far as i can tell. but that doesnt mean it actually does

Is it annoyingly complex?:
-no i hope not. i provided a pretty clear example imo. if you feel otherwise then tell me so.

Code is available in zip and tar:
http://www.infsec.net/code_bench.tar.gz
and here it is if you wanna just look:
Code:
/*	12/25/03 - Merry Xmas!!!
 *	This code is meant to provide reasonably accurate benchmarking of functions
 *	that iterate over a set of data.  it is meant to be as modular as possible
 *	so as to make testing of different functions go as fast as possible.  It 
 *	calculates the total number of cycles required for a function to run.  It 
 *	also calculates the CPE of a function that iterates over a dataset. What is
 *	the CPE you ask? 
 *	CPE - cycles per element.  the number of cycles required to process an element
 *	of a data set.  This is a term i stole from this book(which i highly recommend):
 *	http://csapp.cs.cmu.edu/
 *	It's a good way imo(and theirs) of benchmarking code b/c it lets you clearly see
 *	how the processor is performing. example: the intel pentiums have an integer arithmetic
 *	unit that is capable of executing addittion with a latency of 1 cycle. it is also
 *	capable of starting a new instruction every cycle.  Now if you just time your code using
 *	something like times, you really have no idea how close your code is coming to reaching
 *	the max capabilities of the processor. if you instead have a measure of how many cycles
 *	it takes for an element to be processed, there is much clearer relationship between what
 *	is going on in the processor and where the delays are occurring.  
 *	
 *	BUGS:
 *		i have compared all my tests to the benchmarks in the above book, and my results
 *		using their code are nearly the same as their results, so im fairly confident that
 *		this works correctly.  i emailed my code to the author and asked him to check it 
 *		out, and will be updating anything as necessary and reposting in the thread where
 *		this was posted.  
 *	TESTED ON:
 *		this code was written and compiled on redhat 8 and debian 4.  i've come to learn
 *		(unfortunately) that some of the code i've written will compile fine on one version
 *		of gcc and then have several errors on others; so im only hoping that you'll be able
 *		to compile this. i'd like to make it work on as many platforms as possible, so if you
 *		fix it to work on a different one then let me see it plz. 
 *	BUILD:
 *		due to the idiocy of the gcc inline assembler the asm CANNOT be inlined or it breaks
 *		as soon as it is optimized.  so i had to stick the asm routines in a separate assembly
 *		file. the way i've compiled is like so:
 *		gcc -Wall this_source_file.c the_assembly_source.s  
 *		
 *		feel free to do w/e u like with this but if you make it better u 
 *		gotta share with me. 
 *  UPDATED:
 *      02/04/03 - fixed it so that it uses __u64 unsigned ints to store the stamp values. before
 *      i wasn't checking for an overflow in the low 32 bits of the counter, now it does :).
 *			-sean larsson	*/



and some output:
[code]
[n00b@highjack3d] ./a.out 
loop is not unrolled
overhead is 37 cycles
+-function took 4167 cycles     That's a CPE of 4.069336
+-function took 3243 cycles     That's a CPE of 3.166992
+-function took 2926 cycles     That's a CPE of 2.857422
+-function took 2983 cycles     That's a CPE of 2.913086
+-function took 3028 cycles     That's a CPE of 2.957031
sum was 523776
unrolling the loop by 6
overhead is 37 cycles
+-function took 2990 cycles     That's a CPE of 2.919922
+-function took 1972 cycles     That's a CPE of 1.925781
+-function took 1998 cycles     That's a CPE of 1.951172
+-function took 2025 cycles     That's a CPE of 1.977539
+-function took 1978 cycles     That's a CPE of 1.931641
sum was 523776

from teh output you can clearly see the difference between code/data being in the cache or not.

ps. this information was gleaned from the followin:
-intel manuals, primarily vol2,3, and optimizing 1
-the above mentioned book
-the link laying around in one of these threads about performance posted by jc(peenie) regarding the rdtsc instruction proper use of it

Last edited by infamous41md : November 28th, 2004 at 02:24 PM.

Reply With Quote
  #2  
Old December 26th, 2003, 08:50 AM
mitakeet's Avatar
mitakeet mitakeet is offline
Last Day: May 28, 2005
Dev Shed Demi-God (4500 - 4999 posts)
 
Join Date: Jul 2003
Location: Maryland
Posts: 4,575 mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 9 h 51 m 4 sec
Reputation Power: 21
It is going to take me a while to digest this, plus I will have to figure out how to get it to compile with VC++. When (if) I get it to compile with VC, I will post (but don't hold your breath, it may take me all day).
__________________

Left DevShed May 28, 2005. Reason: Unresponsive administrators.
Free code: http://sol-biotech.com/code/.
Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.

It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
--Me, I just made it up

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
--George Bernard Shaw

Reply With Quote
  #3  
Old December 26th, 2003, 03:21 PM
infamous41md's Avatar
infamous41md infamous41md is offline
not a fan of fascism (n00b)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Feb 2003
Location: ct
Posts: 2,756 infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 2 Days 11 h 4 m 29 sec
Reputation Power: 26
i got this to work on MSVC++6. i had to change a lot of crap, i had to totally remove the far superios 'sfence' insructoin and replace it with 'cpuid' to serialize. i also couldn't get some of the pointer crap correct in the asm functions so i had to use temp local variables. this stuff is ugly, and there is more overhead as well. it works, but it is _NOT_ as precise as the gcc/linux code. but its somewhere to start from at least. this is a zip of the whole workspace folder.
Attached Files
File Type: zip benchmarking.zip (196.0 KB, 277 views)

Reply With Quote
  #4  
Old December 26th, 2003, 03:37 PM
mitakeet's Avatar
mitakeet mitakeet is offline
Last Day: May 28, 2005
Dev Shed Demi-God (4500 - 4999 posts)
 
Join Date: Jul 2003
Location: Maryland
Posts: 4,575 mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level)mitakeet User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 9 h 51 m 4 sec
Reputation Power: 21
I haven't looked at your code yet (been working on work, if you can believe that), yet isn't there a way to define instructions in all assembers (macros or something)? If so (as I feel quite strongly), you could define your own fence instruction and, presuming you did so correctly, the results should be exactly as you desire. The disassembly may look a little screwed up, as if the dissasembler lacks the appropriate mnemonic it probably does something screwy, but so what?

Remember, despite all noise to the contrary, computers are tools that obey our instructions (sometimes that is the problem)!

Reply With Quote
  #5  
Old December 27th, 2003, 05:15 PM
infamous41md's Avatar
infamous41md infamous41md is offline
not a fan of fascism (n00b)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Feb 2003
Location: ct
Posts: 2,756 infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level)infamous41md User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 2 Days 11 h 4 m 29 sec
Reputation Power: 26
you're probably right, but i'd rather figure out why in the hell it doesnt even accept the sfence instruction instead. i tried just using a one-line inline asm with just sfence, and it gives me some bs compiling error about a 'newline in asm'. i m gonna have another look at it 2nite tho, but if anyone else has any ideas about this i'd like to hear them.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesC Programming > Code Benchmarking Tests


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway
Stay green...Green IT