In-depth study of the string function strlen

I still remember that when I started to learn C, the memory of a class of functions was particularly profound. Various exams would test the implementation of strlen, strlen and other functions. When I graduated to find a job, many companies’ pen exams also included strlen and strcpy. The implementation of such functions. It can be seen that the string manipulation class function is favored by the teachers and the company. Then this article will study the function strlen!

Maybe you are already at BS at this time, I am thinking about such a thing, what else needs to be studied. I can do it in an instant, so you wrote this code:

[cpp] view plain copy

Int strlen( constchar* str )

{

Int length = 0;

While ( *str++ )

++length;

Return ( length );

}

Wow! You are really fast, really wrote this simple and refined strlen in a flash, yes, your C language exam passed, and the company's written test has passed, it is worthy of congratulations. However, it seems that the problem has been solved so quickly. How can this article proceed? Then analyze the strlen that you instantly kill, she is so perfect, and the engineers of MS are exactly the same, and the overall code is a few lines of code, then, for a few lines to solve the problem. ? Is there a better solution? When you have a chance, you come up with one of them:

[cpp] view plain copy

Int strlen( constchar* str )

{

Constchar* ptr = str;

While ( *str++ )

;

Return ( str - ptr - 1 );

}

The so-called short code is not necessarily optimal, of course, can not be traced to the software project, we can see that the two implementations, str++ is moving backwards byte by byte, the time complexity is O (n), So this strlen can be done very simply, so what is the better solution? Imagine if you can jump a few bytes, not to be able to complete the length faster, without reducing the complexity? Let’s wait and see.

This series is to analyze the functions of the intel module in the crt library, then we look for the implementation of strlen there, ah! Actually found, it is located in VC / crt / src / intel / strlen.asm. Open and see, hey, a little dizzy. But the most eye-catching thing is that in the previous comments, the MS engineers wrote a "note version" strlen, which is exactly the same as the strlen you implemented earlier. However, it is an annotated version and will not be compiled into the program. Then continue to look at the assembly implementation below, the code is as follows:

[cpp] view plain copy

CODESEG

Public strlen

Strlen proc \

Buf:ptr byte

OPTION PROLOGUE: NONE, EPILOGUE: NONE

.FPO ( 0, 1, 0, 0, 0, 0 )

String equ [esp + 4]

Mov ecx,string ; ecx -> string

Test ecx,3 ; test if string is aligned on 32 bits

Je short main_loop

Str_misaligned:

; simple byte loop until string is aligned

Mov al,byte ptr [ecx]

Add ecx,1

Test al,al

Je short byte_3

Test ecx, 3

Jne short str_misaligned

Add eax,dword ptr 0 ; 5 byte nop to align label below

Align 16 ; should be redundant

Main_loop:

Mov eax,dword ptr [ecx] ; read 4 bytes

Mov edx,7efefeffh

Add edx,eax

Xor eax, -1

Xor eax, edx

Add ecx,4

Test eax,81010100h

Je short main_loop

Found zero byte in the loop

Mov eax, [ecx - 4]

Test al,al ; is it byte 0

Je short byte_0

Test ah,ah ; is it byte 1

Je short byte_1

Test eax,00ff0000h ; is it byte 2

Je short byte_2

Test eax,0ff000000h ; is it byte 3

Je short byte_3

Jmp short main_loop ; taken if bits 24-30 are clear and bit

; 31 is set

Byte_3:

Lea eax, [ecx - 1]

Mov ecx,string

Sub eax, ecx

Ret

Byte_2:

Lea eax, [ecx - 2]

Mov ecx,string

Sub eax, ecx

Ret

Byte_1:

Lea eax, [ecx - 3]

Mov ecx,string

Sub eax, ecx

Ret

Byte_0:

Lea eax, [ecx - 4]

Mov ecx,string

Sub eax, ecx

Ret

Strlen endp

End

Looking at the assembly code of the main part, let's take a step-by-step study.

First of all, is to declare the public symbol of strlen, and the function parameters of strlen, etc., OPTION one code is to let the assembler not generate the start code and the end code (this can refer to the relevant literature, not explained in detail here), the next sentence .FPO is related to the stack pointer omission (FramePointOmission). The explanation in MSDN is as follows:

FPO (cdwLocals, cdwParams, cbProlog, cbRegs, fUseBP, cbFrame)

cdwLocals : Number of local variables, an unsigned 32 bit value.

cdwParams : Size of the parameters, an unsigned 16 bit value.

cbProlog : Number of bytes in the function prolog code, an unsigned 8 bit value.

cbRegs : Number of bytes in the function prolog code, an unsigned 8 bit value.

fUseBP: Indicates whether the EBP register has been allocated. either 0 or 1.

cbFrame :Indicates the frame type. Here we only need to pay attention to the second parameter, which is 1, indicating that there is a parameter. Strlen itself is also a parameter. Other parameters, look at the English comment above should be very simple, not explained here. You can also click here to view.

Keep going down and pay attention to these three sentences:

[cpp] view plain copy

String equ [esp + 4]

Mov ecx,string ; ecx -> string

Test ecx,3 ; test if string is aligned on 32 bits

Je short main_loop

In the first sentence, esp+4 is simple. It is explained in detail in the article "The Alloca Insider of Dynamic Allocation Stack Memory". Here is a simple explanation. Esp+4 is the address of the strlen parameter. This address belongs to Stack memory space, and then [esp+4] takes the value, then the address pointed to by strlen parameter (strlen parameter is const char*). If the code is like this:

[cpp] view plain copy

Char szName[] = "masefee";

Strlen( szName );

Then, the address value obtained by [esp+4] above is the first address of the szName array. The previous string equ [esp+4] does not generate any code, string is only equivalent to a macro definition (as to why you need this string, you will know later, you have to believe that all this is justified, this It is also one of the fun of research), so mov ecx, string is equivalent to mov ecx, [esp+4], this sentence directly assigns the address value pointed to by the parameter to the ecx register, ecx is the first of the string The address is up. The next sentence, test ecx, 3, this sentence is to test whether the address value stored in ecx is 4 bytes (32bits) aligned, if it is, then jump to main_loop to execute, otherwise, continue down. Let's look at the unaligned situation first, which is naturally followed by the str_misaligned section:

[cpp] view plain copy

Str_misaligned:

Mov al,byte ptr [ecx]

Add ecx,1

Test al,al

Je short byte_3

Test ecx, 3

Jne short str_misaligned

Add eax,dword ptr 0 ; 5 byte nop to align label below

Align 16 ; should be redundant

Let's not look at this code first. Let's first infer that when we talked about the misalignment, generally for the operating system, the allocation of memory will always be aligned, so here strlen checks to see if it is aligned, then it is not aligned. When is the situation? as follows:

[cpp] view plain copy

Char szName[] = "masefee";

Char* p = szName;

p++; // Make p move backward by one byte, which assumes that it is aligned in 4 bytes. After moving, it is no longer aligned by 4 bytes.

Strlen( p );

Of course, here I deliberately wrote this, there are other situations in practice, such as a string inside a structure, the structure is one-byte aligned, the position of the string is uncertain, then the string The first address may not be 4-byte aligned. Continue with the previous inference. If it is not aligned, it will be aligned first, then continue to find the length. If you find the terminator in the process of realigning it, it will stop and return the length immediately. Ok, the inference is over. Looking at the assembly code above, it really does.

First, take a byte from the memory pointed to by ecx to al, then add 1 to ecx to move backward one byte, then judge whether al is 0. If it is 0, jump to byte_3, otherwise continue to test the current address of ecx. If the value is already aligned, if it is not aligned, continue to take a byte value, then add ecx until it is aligned or hits the terminator. When the end character is not encountered and the address value stored by ecx is already aligned, the following sentence add eax, dword ptr 0, followed by a comment indicating that the code has no practical meaning. Align 16 and the previous add work together to align the code with 16 bytes, and the main_loop is the address starting with 16-byte alignment (again, the MS engineers are smart, consider it very thoughtful).

Next, it's time to go to main_loop. Obviously this is the meaning of the main loop and the core of strlen. Here is a clever algorithm that analyzes the first half of the code:

[cpp] view plain copy

Mov eax,dword ptr [ecx] ; read 4 bytes

Mov edx,7efefeffh

Add edx,eax

Xor eax, -1

Xor eax, edx

Add ecx,4

Test eax,81010100h

Je short main_loop

First, the first sentence reads 4 bytes into eax in the memory pointed to by ecx. Obviously, I want to process 4 bytes. Then look at the second sentence, assign edx to 0x7efefeff, what seems to be the rule of this number, what is the use? Take a look at the binary of this number:

01111110 11111110 11111110 11111111 Looking at the binary of this number, we noticed that there are 4 red 0s, they all have a feature, which is on the left side of each byte. What is the use? Think about it again, on the left, when will it be modified? Obviously, when there is a carry on the right side, it will be modified to 0, or the position of these 0s will be changed when it is operated with another number. First, don't be busy analyzing, first look at the next sentence add edx, eax, this sentence is the 4-byte integer extracted from the memory pointed to by ecx and 0x7efefeff added, strange, so what is the significance of this addition? Think carefully, surprised, the principle can be added to know which one or which of the 4-byte integers are 0. A value of 0 achieves the purpose of strlen, and strlen is to find the terminator and then return the length. Looking at the addition process, the purpose of the addition is to make some 0 of the above 4 red 0s changed. If any 0 has not changed and the highest 0 has not changed, it means that there is some in the 4 bytes. Or some bytes are 0. These red 0's can be called holes, and they are also very image. for example:

Byte3 byte2 byte1 byte0

???????? 00000000 ???????? ???????? // eax

+ 01111110 11111110 11111110 11111111 // edx = 0x7efefeff The above assumes that two numbers are added, the question mark represents 0 or 1, but the entire byte is not all 0, and eax's byte2 is all 0, added to edx's byte2, regardless of byte1 and byte0 How to add, the final carry can only be up to 1, then the lowest bit of byte3 can never be changed. By analogy, if byte0 is 0, the lowest bit of byte1 can never be changed. Only byte0 has 1 bit and is not 0. The lowest bit of byte1 will receive a carry, which is why edx's byte0 is 0xff. All bytes are judged by the carry, as long as there is no carry on the right, the byte must be 0.

Moving on, xor eax, -1 reverses eax (4 bytes taken from the memory pointed to by ecx). Then xor eax, edx, the intent of this sentence is to take out the bits that have not changed in the value after the previous addition (add edx, the value of edx after eax), continue, add ecx, 4 means to move ecx backwards 4 Bytes for the next operation. Then, a test eax, 81010100h, this 0x81010100 is the negation of the previous 0x7efefeff, that is, the position of several holes is 1. Then compare with the previously removed values ​​(add edx, the value of edx after eax): if the result is 0, it means the value after addition (add edx, the value of edx after eax) and The original value eax (4 bytes of the original string taken out) is compared, and each 0 position (hole) is changed relative to the position of 4 0's (hold) in 0x7efefeff (or relative) In the position of 4 1 (also the position of hold) in 0x81010100, the position of each 1 is changed); if it is not 0, similarly, it is found that there is a byte of 0. From this point of view, the test with 0x81010100 is to determine the position of the holds of the value of the 4 bytes extracted from the string and the value added to 0x7efefeff relative to the hole in the original 4 bytes. In the position, which bits of the hole position have been changed. If the position of each hole changes, the test result is 0, indicating that no byte is 0, otherwise, it means that there is a byte of 0.

When it is found that there is a byte of 0, then the extracted 4 bytes should be byte by byte to determine which byte is 0, as follows:

[cpp] view plain copy

Mov eax, [ecx - 4]

Test al,al ; is it byte 0

Je short byte_0

Test ah,ah ; is it byte 1

Je short byte_1

Test eax,00ff0000h ; is it byte 2

Je short byte_2

Test eax,0ff000000h ; is it byte 3

Je short byte_3

Jmp short main_loop ; taken if bits 24-30 are clear and bit

; 31 is set

As above, the reason for the first sentence [ecx-4] is because ecx adds 4 in front, so you need to subtract 4 to start the 4 bytes, and then byte by byte to determine which byte is 0. The code is very simple, here is Not detailed. Here, if a byte is found to be 0, then jump to the corresponding tail section, as follows:

[cpp] view plain copy

Byte_3:

Lea eax, [ecx - 1]

Mov ecx,string

Sub eax, ecx

Ret

Byte_2:

Lea eax, [ecx - 2]

Mov ecx,string

Sub eax, ecx

Ret

Byte_1:

Lea eax, [ecx - 3]

Mov ecx,string

Sub eax, ecx

Ret

Byte_0:

Lea eax, [ecx - 4]

Mov ecx,string

Sub eax, ecx

Ret

Take byte_3 as an example, that is, among the four bytes taken out, the 4th byte is 0, the first 3 bytes are not 0, so eax should be equal to ecx-1, and then ecx is reassigned to a string. The first address (you should understand here that you have a string macro). Finally, sub eax, ecx directly obtains the length of the string. Then ret returns to the upper layer. The whole strlen is over.

Through the previous analysis, we already know the principle of strlen, and have a deeper understanding of the beauty of the algorithm. We can translate this assembly version of strlen into a C language version as follows:

[cpp] view plain copy

Size_t strlen( constchar* str )

{

Constchar* ptr = str;

For ( ; ( ( int )ptr & 0x03 ) != 0; ++ptr )

{

If ( *ptr == '\0' )

Return ptr - str;

}

Unsigned int* ptr_d = ( unsigned int* )ptr;

Unsigned int magic = 0x7efefeff;

While ( true )

{

Unsigned int bits32 = *ptr_d++;

If ( ( ( bits32 + magic ) ^ ( bits32 ^ -1 ) ) & ~magic ) != 0 ) // bits32 ^ -1 is equivalent to ~bits32

{

Ptr = ( constchar* )( ptr_d - 1 );

If ( ptr[ 0 ] == 0 )

Return ptr - str;

If ( ptr[ 1 ] == 0 )

Return ptr - str + 1;

If ( ptr[ 2 ] == 0 )

Return ptr - str + 2;

If ( ptr[ 3 ] == 0 )

Return ptr - str + 3;

}

}

}

Ok, strlen is almost finished, and the final C language version can be changed, for example, based on the character's encoding set. However, it is generally not needed, and GM is better. I did a test, compare the performance of the C language version, the last C language version and the assembly version of crt at the beginning of this article, find the length of the same string, find 10000000 times, turn on O2 optimization, the average time of the three is :

Normal C language version: 723 milliseconds

The latter translation C version: 315 milliseconds

CRT assembly version: 218 milliseconds visible, the performance of the latter two has a certain improvement, here we need to explain that crt's strlen function belongs to intrinsic function, so-called intrinsic function, can be called as an internal function, which is similar to the inline function, but not Inline meaning. Inline is not mandatory and is different when compiled by the compiler. The intrinsic function is equivalent to determining whether to compile the function code in assembly-level inline at the compile time according to the context, etc., and optimize it while inline, thereby eliminating the function call overhead and optimizing the directness. . The compiler is familiar with the intrinsic functions of the intrinsic function, and is often referred to as a built-in function, so the compiler can be better integrated and optimized with only one purpose. In a specific environment, choose the optimal solution. Take strlen for example, such a piece of code:

[cpp] view plain copy

Int main( int argc, char** argv )

{

Int len ​​= strlen( argv[ 0 ] );

Printf( "%d", len );

Return 0;

}

When disable optimization under debug, disable optimization under release, or minimize size (/O1) under release, you can force the intrinsic internal function option (/Oi) to be enabled. After this is turned on, the above strlen function will no longer call the assembly version of crt. The function, but directly embedded in the main function code, as follows (disable optimization or open the internal function (/Oi) under debug or release):

[cpp] view plain copy

Int len ​​= strlen( argv[ 0 ] );

0042D8DE mov eax, dword ptr [argv]

0042D8E1 mov ecx,dword ptr [eax]

0042D8E3 mov dword ptr [ebp-0D0h], ecx

0042D8E9 mov edx, dword ptr [ebp-0D0h]

0042D8EF add edx,1

0042D8F2 mov dword ptr [ebp-0D4h], edx

0042D8F8 mov eax,dword ptr [ebp-0D0h]<------

0042D8FE mov cl,byte ptr [eax] |

0042D900 mov byte ptr [ebp-0D5h],cl | // byte by byte

0042D906 add dword ptr [ebp-0D0h],1 |

0042D90D cmp byte ptr [ebp-0D5h],0 |

0042D914 jne main+38h (42D8F8h) // ---------

0042D916 mov edx,dword ptr [ebp-0D0h]

0042D91C sub edx,dword ptr [ebp-0D4h]

0042D922 mov dword ptr [ebp-0DCh], edx

0042D928 mov eax,dword ptr [ebp-0DCh]

0042D92E mov dword ptr [len], eax

If the minimum size (/O1) is turned on under release and the internal function (/Oi) is turned on, the compiled code is as follows:

[cpp] view plain copy

Int len ​​= strlen( argv[ 0 ] );

00401000 mov eax,dword ptr [esp+8]

00401004 mov eax,dword ptr [eax]

00401006 lea edx,[eax+1]

00401009 mov cl,byte ptr [eax]<------

0040100B inc eax | // byte by byte

0040100C test cl,cl |

0040100E jne main+9 (401009h) -------

00401010 sub eax, edx

The code is much more concise, and there is no function call overhead (in fact, you will be surprised to find that these codes are the disassembly code of strlen in the second C language version of this article, of course, the optimized code, here Going to call the overhead. In fact, the two strlen at the beginning of this article, when the higher optimization level is turned on, the compiler will also optimize the inline of these two functions, which is consistent with the intrinsic function. This shows that compiling The device is user-friendly, as long as it can meet the optimization conditions, it will be decisively optimized). The code generated is consistent when the minimized size (/O1) optimization is turned on and the internal function (/Oi) optimization is turned on and the maximum speed (/O2) or full optimization (/Ox) is turned on. When maximizing speed (/O2) or full optimization (/Ox) is enabled with release, even if you don't enable internal function (/Oi) optimization, the compiler will also process strlen to produce the above code. This is related to the level of optimization, the level is high, and naturally it will be more comprehensive optimization, whether or not you force something to be set. It is also a humanized design. To enable a function for internal function optimization, you can open it by code, as follows:

[cpp] view plain copy

#pragma intrinsic( strlen )

There are open, naturally there are also closed, as follows:

[cpp] view plain copy

#pragma function( strlen )

Force the optimization of strlen to be turned off, so that even if you are maximizing speed (/O2) or fully optimizing (/Ox), you will still call crt's strlen function. For a detailed description of both, please refer to MSDN, or click here.

Regarding this intrinsic pragma, MSDN has a detailed and accurate explanation, or the original English text can better understand its original intention:

The intrinsic pragma tells the compiler that a function has known behavior. The compiler may call the function and not replace the function call with inline instructions, if it will result in better performance. .........

Programs that use intrinsic functions are faster because they do not have the overhead of function calls but may be larger due to the additional code generated.

By the way, don't try to use these two things to force the opening or closing of a normal function (/Oi) optimization. The so-called intrinsic, of course, is some of the compiler's default functions. It can be considered as an optional optimization. . If you don't believe me, then you will definitely get a warning:

Warning C4163: "xxxxx": Cannot be used as an internal function.

For the related optimization of intrinsic, the compiler handles it more flexibly, which means that it is not mandatory. If SSE is enabled, the compiler will also consider SSE optimization. In principle, it is known that this is the case. The focus of this article is How to dig and think about many details. As for the specific default functions, and what are the detailed descriptions, please refer to MSDN, or click on the previous link. It is no longer exhaustive here, and it has been written so long. .

At the same time, I once again lamented MS engineers, the details are doing very well. This is also worth pondering in the immersive environment of the domestic IT industry.

CAT6A Flat Patch Cable

CAT6A Flat Patch Cable, A Flat cable design helps improve the look of your home or office. Flat cables are super flexible and can be run under the carpet or bent through corners or into desks.


A double stranded DU line from the latest BAI in ISO/IEC 11801 class 6A /F standard 1. It is mainly used for the application and development of ten thousand ZHI bit Ethernet DAO network technology.CAT6A is an shielded twisted pair , thus providing a combined attenuation to crossposition ratio of 500MHZ, more than twice that of CAT6 and Cat5e patch cable, with a transmission rate of up to 10Gbps. In the CAT6A Ethernet Cable, each pair has a shield layer, and four pairs of wires together have a common shield layer.
In terms of physical structure, the additional shielding layer makes the CAT6A have a larger diameter.Another important difference is its ability to connect hardware. The parameters of the cat7 of systems require that all pairs of wires provide at least 60DB of integrated proximal winding at 600MHZ.The cat5e systems only require 43DB at 100MHZ and 46DB at 250MHZ for the CAT6.

Network RJ45 Patch Cable CAT6A Flat  (3)

Cat6a Flat Ethernet Cable,Nylon Braided Shielded Ethernet Cable,Cat6a Flat Cable,Nylon Braided Network Cable Cat7

Shenzhen Kingwire Electronics Co., Ltd. , https://www.kingwires.com