How does the std::string constructor handle char[] of fixed size?

  • A+
Category:Languages

How does the string constructor handle char[] of a fixed size when the actual sequence of characters in that char[] could be smaller than the maximum size?

char foo[64];//can hold up to 64 char* bar = "0123456789"; //Much less than 64 chars, terminated with '/0' strcpy(foo,bar); //Copy shorter into longer std::string banz(foo);//Make a large string 

In this example will the size of the banz objects string be based on the original char* length or the char[] that it is copied into?

 


First you have to remember (or know) that char strings in C++ are really called null-terminated byte strings. That null-terminated bit is a special character ('/0') that tells the end of the string.

The second thing you have to remember (or know) is that arrays naturally decays to pointers to the arrays first element. In the case of foo from your example, when you use foo the compiler really does &foo[0].

Finally, if we look at e.g. this std::string constructor reference you will see that there is an overload (number 5) that accepts a const CharT* (with CharT being a char for normal char strings).

Putting it all together, with

std::string banz(foo); 

you pass a pointer to the first character of foo, and the std::string constructor will treat it as a null-terminated byte string. And from finding the null-terminator it knows the length of the string. The actual size of the array is irrelevant and not used.

If you want to set the size of the std::string object, you need to explicitly do it by passing a length argument (variant 4 in the constructor reference):

std::string banz(foo, sizeof foo); 

This will ignore the null-terminator and set the length of banz to the size of the array. Note that the null-terminator will still be stored in the string, so passing a pointer (as retrieved by e.g. the c_str function) to a function which expects a null-terminated string, then the string will seem short. Also note that the data after the null-terminator will be uninitialized and have indeterminate contents. You must initialize that data before you use it, otherwise you will have undefined behavior (and in C++ even reading indeterminate data is UB).


As mentioned in a comment from MSalters, the UB from reading uninitialized and indeterminate data also goes for the construction of the banz object using an explicit size. It will typically work and not lead to any problems, but it does break the rules set out in the C++ specification.

Fixing it is easy though:

char foo[64] = { 0 };//can hold up to 64 

The above will initialize all of the array to zero. The following strcpy call will not touch the data of the array beyond the terminator, and as such the remainder of the array will be initialized.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: