Tuesday 29 November 2011

Profiling time elapse of array access using rdtscll in a kernel module

Profiling time elapse of array access using rdtscll in a kernel module

Hi,
I am trying to profile the time elapse of accessing an integer array using rdtscll.
I wrote a user-mode program and a kernel module doing the same work. The
results from rdtsc however look pretty odd.
The machine is an Intel xeon 3.1Ghz box.

The user-mode program is

Code:

unsigned long long ini, end;
int dummy;

/* Initialize a 6144KB integer array */
.. ...
gettimeofday(&start_tv, &tz);
ini = rdtsc();
for (i = 0; i < array_element; i++) { dummy = int_array[j]; }
end = rdtsc();
gettimeofday(&end_tv, &tz1);

.. ...
printf("Time elapse %llu cycles\n", end-ini);
printf("Time elapse in %lf usec\n", ....);

So I got the output of
ini: 2032056902618536, end: 2032056914493232
Time elapse 11874696 cycles
Time elapse in 3769.000000 usec
These number looks reasonable.

However, when I use the same logic in my kernel module which has the following
code

Code:

data_array = (int *)vmalloc(6144KB);
for(i = 0; i < 6144KB/sizeof(int); i++) { data_array[i] = random32() % 1000;}

rdtscll(ini);
for(i = 0; i < nr_elements; i++) {
    val = data_array[i];
}
rdtscll(end);
printk("ini: %llu, end: %llu\n", ini, end);
printk("TIme elapse %llu\n", end - ini);

The outputs are
ini: 2031745377590838, end: 2031745377591152
data array access is 314.
So it only takes about 314 CPU cycles to traverse the 6144KB size array, which
should not be correct. Assuming that in an ideal case the CPU uses 1 cycle to load
the integer from on-chip cache, and the total number of integer of 6144KB array is
1572864, which is much greater than 314. As the user-mode time elapse (11874696
cycles) looks reasonable, I am wondering is there anything wrong in my kernel
module. Any suggestion is welcome. Thanks in advance.

No comments:

Post a Comment