Sunday, 15 September 2013

performance - Why get data from references slower then calculate in c#? -


I type 2 kinds of testing to get my data 65,000,000 behind a 65,000,000 loop Is different 0.5-0.6 seconds.

Test 1:

  ushort e = myFunction (a, b, c); Ushort d = (ushort) (e + (f & lt; & lt; 13)); Private static ehort myFunction (ah one, int b, int c) {if (a & lt; (ushort) b) Return to 0; If (A & gt; (Uhort) C) Returns (USHort) (C - B); Return (USHort) (A - B);  

Test 2:

  ushort d = arr2Demension [i, j] .data;  

data class type of arr2Dimension is a public data member.

I examined the time for each (separately) 3 times, the first test was always sharp during 0.5-0.6 seconds.

Why is that so? (Why to get data from slow reference?)

When you do not post code that We can actually try to run on ourselves but yes, there is a pattern of code like this, basically it is revealed that access memory is one of the slowest things the processor can do.

This is a problem related to distance , an electronic circuit is removed physically, the slow signal should be to ensure that it is not contaminated. This is the basic principle of "no free lunch" in electronic engineering and new processors using silicon size feature are automatically faster.

Your My Function () method uses processor resources that are very close. A, B and C variables and method return values ​​are stored in registers , a very small storage space which is too close to the processor's execution engine. The closer it is to be expected, the code runs on the full bore, executing several instructions for each clock cycle simultaneously. If it can be , by the statements (statements), then they are only faster if the processor's good branch forecast data is available depending on how to continuously branch the method in the previous execution Went. Or in other words, how random A, B and C values ​​are.

The array needs to be reached which is not around the processor array element can be present in the L1 cache, which takes 3 cycles. The prefix has accurately predicted that a single cycle can occur when the array is large, then the processor can only find it in L2 or L3 caches when the speed starts with the tank. When it is to be obtained from RAM, it will be horrific, the processor will be discontinued for 150 cycles of the fiber so that the data can be supplied for the RAM. The distance of the appeal, you can pop up the case and see the wires that connect the RAM to the processor.


No comments:

Post a Comment