Derived data types_Learning D-QQ阅读男生历史网

上QQ阅读APP看书，第一时间看更新

Derived data types

In this section, we're going to observe D's take on pointers, arrays, strings, and associative arrays. Much of what we'll cover here is very different from other C-family languages.

Pointers

As in other languages that support them, pointers in D are special variables intended to hold memory addresses. Take a moment to compile and run the following:

int* p;
writeln("p's value is ", p);
writeln("p's type is ", typeid(p));
writeln("p's size is ", p.sizeof);

First, look at the declaration. It should look very familiar to many C-family programmers. All pointer declarations are default initialized to null, so here the first call to writeln prints "null" as the value. The type of p printed in the second writeln is int*. The last line will print 4 in 32-bit and 8 in 64-bit.

So far so good. Now look at the following line and guess what type b is:

int* a, b;

No, b is not an int, it is an int*. The equivalent C or C++ code would look like this:

int *x, *y;

In D, x would be interpreted as int* and y as int**, causing a compiler error. Every symbol in a declaration must have the same type. No matter how many identifiers are in a pointer declaration, only one * is needed and it applies to each of them. As such, it's considered best practice to put the * next to the type, as in the first declaration, rather than next to the identifiers. Otherwise, pointers in D function much as they do elsewhere. The unary & operator can be used to take the address of any variable and the * operator can be used to dereference a pointer to fetch the value it's pointing at. Pointer types can be inferred like any other type. Changing the value to which a pointer points will be reflected when the pointer is next dereferenced.

auto num = 1;
auto numPtr = &num;
writefln("The value at address %s is %s", numPtr, *numPtr);
num = 2;
writefln("The value at address %s is %s", numPtr, *numPtr);

Here, the address of num is assigned to numPtr. Since num is inferred to be int, the type of numPtr is inferred as int*. Both calls to writeln first print the value of numPtr, which is the address of num, then dereference numPtr to print the value of num. Memory addresses are printed as hexadecimal numbers by default. The following is the output:

The value at address 18FE34 is 1
The value at address 18FE34 is 2

void pointers are used to represent pointers to any type, but it's rare to use them in D except when interfacing with C APIs. Dereferencing a void pointer directly is an error; it must first be cast to the appropriate pointer type. Pointers to other types can be implicitly converted to void*, though the reverse is not allowed.

auto num = 1;        // int
void* voidPtr = &num;  // OK: int* converts to void*
writeln(*voidPtr); // Error: void has no value
writeln(*cast(int*)voidPtr); // OK: dereferencing int*.

All of the pointers we've seen so far point to values on the stack. Pointers can also point to blocks of heap memory. We can allocate heap memory using the new expression (you'll learn how to allocate multiple values with new when we take a look at arrays).

int* intPtr = new int;  // Allocate memory for a single int
*intPtr = 10;

The heap and the stack work as they do in C, except that D has a garbage collector involved. Memory allocated with new is managed by the GC. Additionally, using certain language features can implicitly cause GC memory to be allocated. We'll discuss those cases when we come across them.

It's also possible to bypass the garbage collector completely and use alternative allocators such as C's malloc.

import core.stdc.stdlib : malloc, free;
int* intsPtr = cast(int*)malloc(int.sizeof * 10); // Ten ints
free(intsPtr);
auto dontDoThis = malloc(int.sizeof);
auto thisIsOK = cast(int*)malloc(int.sizeof);

The variable dontDoThis is inferred to be void*, which usually isn't what you want. Always pay attention when using type inference. Another point of note is that allocating memory in this manner loses the benefit of default initialization. Any memory allocated through malloc should be treated just as it would be in C. It's also worth noting here that D supports pointer arithmetic, which you could use to iterate intsPtr. You can also use the array index operator, [], to access elements of intsPtr. Both approaches are frowned upon in D, however. It's much safer to convert intsPtr to an array.

Arrays

Arrays in D are a popular feature, slices in particular. They aren't your grandpa's arrays, though. D does things a bit differently than elsewhere in the C family. We're going to spend a few pages digging into them so that you can avoid common beginner mistakes.

Array basics

The first thing to understand about arrays in D is that they are fat pointers; each array carries around both a length and a pointer to the memory block where its elements are stored. Conceptually, you can think of an array as a struct that looks like this:

struct(T) {
  size_t length;
  T* ptr;
}

T is the type of the array elements. On every array, both .length and .ptr are accessible as properties.

Static arrays are allocated on the stack. They have a fixed length that does not change.

int[3] stat1;
writeln(stat1);

Compile this snippet and the writeln will print [0, 0, 0]. Three int values were allocated on the stack and have all been initialized to int.init. A dynamic array can grow and shrink as required. The syntax of a dynamic array declaration looks like this:

int[] dynArray1;

Unlike stat1, this array is empty. No space is allocated for any elements, only enough stack space to hold the metadata. The default initializer for a dynamic array is the empty array, []. Its .length will be 0 and its .ptr will be null. We can allocate space for the array elements using new.

dynArray1 = new int[3];
int[] dynArray2 = new int[10];

Note

Some D users think the syntax auto arr = new int[3] is too similar to the static array declaration auto arr = int[3]. D now supports an alternative syntax, new int[](3). This new syntax is recommended, but old habits die hard. There is a large body of D code that uses the older syntax.

The first array will now have three int values, the second will have ten, and all of the values will be default initialized to int.init. Actually, the runtime will probably have allocated more room than necessary. You can see this with the .capacity property.

writeln("#1:", dynArray1.capacity);
writeln("#2:", dynArray2.capacity);

.capacity returns the maximum length the array can grow to before reallocation is needed. The two writeln calls above print 3 and 15 for me. This first number tells us that new int[3] allocated exactly enough space for three int values. If we append a new value to dynArray1, a reallocation will take place. The second number tells us that new int[10] allocated enough space for fifteen int values. Since we only used ten, there's still space for five more elements to be appended before a reallocation is needed. The allocation algorithm is an implementation detail, so you can't rely on fifteen elements always being allocated when you request ten. What you can rely on is that enough space will be allocated for the number of elements you requested.

This default behavior is fine in many situations, but when you know you're going to be appending numerous items to an array, you can use the reserve function to be more efficient.

int[] dynArray3;
dynArray3.reserve(20);
writefln("%s, %s", dynArray3.length, dynArray3.capacity);

We've asked the runtime to reserve enough space for twenty int values, but none of that space is being used. This is an important difference between new and reserve. The former will allocate the space and return an array containing the number of elements you requested, that is, the new memory is not empty. The latter only allocates the space if the current capacity is smaller than the size requested, but the newly allocated space is empty. You can see this when writefln prints 0, 31 to the screen. There are no elements in the array, but a total of 31 can be appended before a reallocation is needed.

This brings us to the append operator. Using this, you can append individual elements to an array. dynArray3 is empty, so let's give it some values.

dynArray3 ~= 2;
dynArray3 ~= 10;
writeln(dynArray3);

This will print [2, 10]. Now let's combine dynArray3 with dynArray1 to create a new array. To do this, we can use the concatenation operator, ~.

auto dynArray4 = dynArray3 ~ dynArray1;
writeln(dynArray4);

Remember that dynArray1 contains three int values that were initialized to 0, so the writeln in this snippet will print [2, 10, 0, 0, 0]. Since both operands of the concatenation operator are of type int[], the type inferred for dynArray4 is int[]. We can also add elements to a dynamic array by manipulating .length directly.

dynArray1.length += 10;

If there is enough capacity to hold all ten values, no reallocation takes place. Otherwise, more space is allocated. If the current memory block cannot be extended, then a new block is allocated and the existing elements are copied to it. Finally, ten new elements are default initialized in the newly allocated space. Conversely, you can shrink the array by decreasing the length. Be aware that, when you do so, you're causing the capacity to be reset to 0. This has a special significance that will be explained soon in this chapter.

To get at a specific value in any array, use the index operator, []. Arrays use zero-based indexes. The special operator $ is a synonym for array.length; it's only defined inside the brackets and always applies to the array being indexed.

writeln(dynArray4[0]);  // Print the first element
writeln(dynArray4[2]);  // Print the third element
writeln(dynArray4[$-1]); // Print the last element

The index operator works on both static and dynamic arrays. By default, D will do bounds checking at runtime to make sure that you don't read or write past either end of the array, a common source of exploits and other bugs in C and C++ software. Doing so will result in a runtime error reporting a range violation. You can turn bounds checking off by passing -boundscheck=off to the compiler.

Rectangular arrays

A rectangular array (sometimes called a jagged array) is an array of arrays. As we'll soon see, new D programmers often find them confusing. The thing to keep in mind is that they are no different from normal arrays. Declaring them has the same form of elementType[numberOfElements] that is used with any array. It's just that, in a rectangular array, the type of the array elements happens to be another array. Consider the following declaration of a normal array:

int[3] arr;

The arr array is a static array of three int elements, visually clarified by putting parentheses around the type:

(int)[3] arr;

Now look at the following declaration of a rectangular array:

int[3][2] ra1;

The ra1 array is a static array of two int[3] elements. Again, putting parentheses around the type makes it clear.

(int[3])[2] ra1;

Fetching the element at any index in arr, such as arr[0], returns an int. In the same manner, ra1[0] returns an int[3]. We can, in turn, get at its first element with [0], which when implemented as a single statement looks like: ra1[0][0]. I want to stress that none of this is special syntax; we have two index operators in the declaration solely because the type of arr is itself an array type. Since ra1[0] returns an array, then an the additional [0] indexes into the returned array.

Now, about that confusion I mentioned. Many programmers are familiar with C's multidimensional arrays. There's a major difference in how they are declared in C and how rectangular arrays are declared in D. To help illustrate this, consider the following grid:

One way to describe this is as a grid of three rows and four columns. In C, this could be expressed in code like so:

int cgrid[3][4];
cgrid[1][0] = 10; // Set the first element of the second row

In D, we have to look at it a bit differently. In order to access the array elements the same way as in C, where [1][0] is the second row and first column, we have to envision each row as an array of four elements. Given that the [4] is part of the array type, the order of the indexes in the declaration will be the reverse of those in C.

int[4][3] dgrid;
dgrid[1][0] = 10;

To be clear, the declaration is not creating a column-major array; it's still row-major exactly like the C array, so that [1][0] is the second row and first column in both. The only difference is that the [4] is part of the array type. Keep that in mind and you should have no trouble keeping things straight.

Here's another example of a rectangular array:

int[][3] ra2 = [
    [0, 1],
    [2, 3, 4, 5],
    [6, 7, 8]
]
writeln(ra2[0].length);
writeln(ra2[1].length);
writeln(ra2[2].length);

This is a static array of three int[]s, where each element array has a different length. In a C multidimensional array, all of the elements are stored in a contiguous block of memory. In D, this is true when all parts of a rectangular array are static, such as int[3][3]. Any dynamic component in a rectangular array can point to its own separate block of memory, in which case you can't rely on it being contiguous. It's possible to create a dynamic array of dynamic arrays: int[][]. It's also possible to have more than two components, such as int[][][3].

Slices

When thinking about slices, it helps to consider that dynamic arrays are slices and slices are dynamic arrays.

auto tenArray = [5,10,15,20,25,30,35,40,45,50];
auto sliced = tenArray[0 .. 5];

Here, tenArray is an array of ints. It's initialized with an array literal, a feature we'll examine shortly. I've taken a slice from tenArray and assigned it to a variable. The slice operator looks like this: [m .. n], where the first element of the slice is source[m] and the last is source[n-1]. So the first value of sliced is tenArray[0] and the last is tenArray[4]. Pass it to writeln and you'll see [5, 10, 15, 20]. Print the length of sliced and you'll see 5, but the capacity may surprise you.

writeln(sliced.capacity);

This will print 0. When a slice begins life, no new memory is allocated. Instead, it is backed by the source array. Continuing from the preceding snippet:

tenArray[0] = 10;
writeln(sliced);
writeln(tenArray);

Running this will show that tenArray[0] and sliced[0] are both set to 10. The same thing works the other way; any changes made to sliced will be reflected in tenArray. To reinforce this point, add the following lines to the example:

writeln(sliced.ptr);
writeln(tenArray.ptr);

Both pointers are pointing to the same memory block. Now, what do you think would happen if we were to append a new item to sliced, either by increasing the .length or through the ~= operator? The answer lies in that .capacity of 0.

The zero capacity indicates that appending to this slice in place may overwrite the existing elements in memory, that is, those belonging to the original array. In order to avoid any potential overwrite, attempting to append will cause the relationship between the two arrays to be severed. A new memory block will be allocated, which is large enough to hold the existing elements plus the appended one, and all of the elements copied over. Then the .ptr property of the slice will be set to the address of the new memory and its .capacity to a non-zero value.

sliced ~= 55;
writefln("Pointers: %s %s", tenArray.ptr, sliced.ptr);
writefln("Caps: %s %s", tenArray.capacity, sliced.capacity);

Running this code will print two different memory addresses. sliced is no longer backed by the memory of tenArray and now has a capacity of 7. We can say that sliced has become its own array. Sometimes, this isn't the desired behavior. I mentioned earlier that decreasing the .length of an array will reset its capacity to 0. To demonstrate, here's a little slicing trick that has the same effect as decreasing the array length:

auto shrink = [10, 20, 30, 40, 50];
shrink = shrink[0 .. $-1];
writeln(dontShrink);

Four elements are sliced from shrink and then the slice is assigned back to shrink. This is the same as decreasing shrink.length by one and also results in a zero capacity. Either way, the last element in the original array, the number 50, still exists at the same location in memory. The reason .capacity gives us a 0 here is that, if we were to append to shrink, we would overwrite the 50. If another slice is still pointing to the same memory block, it would be affected by any overwrites. To avoid any unintended consequences, D will play it safe and reallocate if we append.

Sometimes it doesn't matter if anything is overwritten. In that case, it's wasteful to reallocate each time the slice shrinks. That's where assumeSafeAppend comes in.

assumeSafeAppend(shrink);

Calling this after decreasing the length will maintain the original capacity, allowing all appends to use the existing memory block. Decreasing the length again will also reset the capacity to 0, requiring another call to assumeSafeAppend if we want to continue reusing the same memory block.

It's possible to remove an element from the middle of an array by taking a slice from in front of it and another from behind it, then concatenating them together. As concatenation allocates a new array, this isn't the most efficient way to go about it. A much better alternative is a function from std.algorithm called remove. Let's say we want to remove the 30 from shrink above. It's at the index 2, so:

import std.algorithm : remove;
shrink = shrink.remove(2);

Now shrink contains the elements [10, 20, 40, 50]. We'll look at the details of remove in Chapter 7, Composing Functional Pipelines with Algorithms and Ranges.

Sometimes, you want to slice an entire array. There's a shortcut for that. Instead of slicing with [0..$], you can use empty brackets, or no brackets at all.

auto aSlice = anArray[];
auto anotherSlice = anArray;

It's possible for static and dynamic arrays to be implicitly converted both ways. When going from dynamic to static, the lengths must match exactly. When going from static to dynamic, the compiler achieves the conversion by taking a slice of the static array:

int[] dyn = [1,2,3];
int[3] stat1 = dyn;     // OK: lengths match
int[4] stat2 = dyn;     // Error: mismatched array lengths
int[] sliced1 = stat1;  // OK: same as stat1[]

The memory for dyn is allocated on the heap, but stat1 lives on the stack. When we initialize stat1, the elements of dyn are copied over and we now have two distinct arrays. In the last line, sliced1 is just like any other slice we've seen so far, no matter that it's a slice of a static array. Its .ptr property will be identical to stat1.ptr and it will have a capacity of 0, so we can append to or expand it without worrying about any impact on stat1. However, if stat1 goes out of scope while sliced1 still points to its memory, bad things can happen. If you can't guarantee that stat1 is going to stick around, you can use the .dup property to copy it.

int[] sliced1 = stat1.dup;

This allocates memory for a dynamic array and copies into it all of the elements from stat1. A similar property, .idup, creates an immutable copy of an array. The details of immutable arrays will be discussed later in the chapter.

D arrays aren't the only things you can slice. Imagine that you've been given an array of integers, but as an int* and an associated length variable rather than an int[]. If you want to stay with your C roots, you can go ahead and use pointer arithmetic to your heart's content. If, on the other hand, you'd prefer the convenience of a D array, the language has got you covered: just slice the pointer. Assuming a C-style int* array called parray, the length of which is stored in a variable named len:

int[] array = parray[0 .. len];

How convenient is that? Be careful, though. As when slicing an array, the slice here is backed by the original pointer. In fact, array.ptr is exactly the same address as parray. This comes with the same potential consequences of slicing a static array. If parray is freed behind your back, or otherwise becomes an invalid memory location, array isn't going to be valid anymore, so the slice of parray should be .duped.

auto array = parray[0 .. len].dup;

Array literals

Take a look at the following array declarations, all initialized with array literals:

auto arr1 = [1.0f,2.0f,3.0f];   // float[]
auto arr2 = [1.0f,2.0,3.0];     // double[]
auto arr3 = [1.0,2.0f,3.0f];    // double[]

This snippet demonstrates that array literals are inferred as dynamic arrays by default. It also shows how the base type of an array is inferred. We see that arr1 contains three floats, so it is of type float[] as one would reasonably expect. In the other arrays, we first see a float followed by two doubles, then a double followed by two floats, yet both arrays are inferred as double[]. The compiler looks at the type of each element and determines their common type. This type becomes the type of the array and all elements are implicitly converted. For example, given an array comprised of shorts and chars, the common type is int; in an array that contains one or more longs and a mix of smaller integer types, the common type is long.

We can use array literals with the append and concatenation operators.

int[] buildMe;
buildMe ~= [1, 2, 3, 4] ~ 5;

Static arrays can also be initialized with array literals, as long as the lengths match.

int[3] arr4 = [1,2,3];      // OK
int[3] arr5 = [1,2,3,4];    // Error: mismatched array lengths

Arrays and void

We've seen that void can be used to turn off default initialization for a variable. This is true for static arrays as well. Normally, every element of a static array will be initialized at the point of declaration. If the array is in global scope, this isn't such a big deal, but if it's in a function that is frequently called, the time taken to initialize all of the array elements can be a performance drain if the array is large.

float[1024] lotsOfFloats = void;

This will allocate 1,024 float values on the stack, but they will not be initialized to nan. Normally, you shouldn't turn off default initialization unless profiling shows it helps.

Tip

Uninitialized dynamic arrays

Allocating a dynamic array with new, such as new float[10], will always initialize the elements with their .init value. Default initialization can be avoided in this case by allocating through std.array.uninitializedArray instead of calling new directly:

auto arr = uninitializedArray!(float[])(10)

It's also possible to declare arrays of type void[]. Like the universal pointer, this is the universal array. A couple of use cases can be found in the Phobos module std.file. The read function there returns a void[] array representing the bytes read from a file. The write function accepts a void[] buffer of bytes to write. You can cast from void[] to any array type and all array types are implicitly convertible to void[].

Array operations

To close our array discussion, we're going to look at how arrays can serve as operands for many of the operators we discussed earlier. First up, let's take a look at a couple of special cases of the assignment operator.

We've seen .dup always allocates a new array, which is wasteful when the goal is to copy elements from one array into an existing one. In an assignment expression, if both arrays are the same type, or the right operand is implicitly convertible to the left, we can add empty brackets to the left operand.

int[] a1 = new int[10];
int[] a2 = [0,1,2,3,4,5,6,7,8,9];
a1[] = a2;

We know that the first line allocates ten integers and initializes them all to 0. We know that the array literal assigned to a2 allocates memory for ten integers and initializes them to the values in the bracket. We know that a1 = a2 would cause both arrays to share the same memory. By adding the empty index operator to the left operand, we're telling the compiler to do the equivalent of going through a2 and assigning each element to the corresponding position in a1. In other words, a1[0] = a2[0], a1[1] = a2[1], and so on. Although the end result looks the same as a1 = a2.dup, there is a major difference. Calling .dup will cause a1.ptr to point to the memory allocated by .dup; with the bracketed assignment, a1.ptr will be unchanged. The first two calls to writeln in this snippet will print the same address and the last one will print something different.

int[] a1 = new int[10];
writeln(a1.ptr);
int[] a2 = [0,1,2,3,4,5,6,7,8,9];
a1[] = a2;
writeln(a1.ptr);
a1 = a2.dup;
writeln(a1.ptr);

Two big caveats here. First, even when the target of the assignment is a dynamic array, the lengths of both arrays must exactly match. The second caveat is that the memory of the two arrays cannot overlap. Consider the following:

int[] a4 = [1,2,3,4,5];
int[] a5 = a4;
a5[] = a4;

This will give you a runtime error complaining about overlapping memory. This is an obvious case, as we know that the assignment of a4 to a5 will result in both arrays pointing to the same location. Where it isn't so obvious is with slices from pointers, or allocating your own array memory outside the GC via malloc. Vigilance is a virtue.

The empty index operator also allows us to assign a single value to every index in an array. Consider the following:

int[10] sa1 = 10;
sa1[] = 100;

Here, every element of sa1 is initialized to 10. In the next line, all ten elements are assigned the value 100. This also shows that the empty index operator works with static arrays equally as well as dynamic arrays. We can apply several other operators to arrays. For example:

int[] a = [2,3,4];
a[] ^^= 2;
writeln(a);
int[] b = [5,6,7];
int[3] c = a[] + b[];
writeln(c);

If you try something like writeln(a[] + b[]) you'll get a compiler error telling you that such operations on arrays require destination memory to write the result. There's no such thing as an implicit temporary for this sort of thing. I encourage you to experiment with the basic operators from earlier in this chapter to see what works with arrays and what doesn't. For example, the shift operators do not accept arrays as operands.

Finally, let's talk about array equality. This is our first opportunity to see the difference between == and is with reference types. Examine the following snippet:

auto ea1 = [1,2,3];
auto ea2 = [1,2,3];
writeln(ea1 == a2);
writeln(ea1 is a2);

Here we have two dynamic arrays with identical elements. The first writeln contains an equality expression. This will do an element-by-element comparison to see if each is the same value and evaluate to true if so. The is operator in the second writeln tests if the two arrays have the same identity. It doesn't care about the elements. So what's our snippet going to print?

Given that both a1 and a2 have the same number of elements and each element has the same value, it's rather obvious that the first writeln will print true. The is operator in the second writeln is going to look at the pointer and length of each array. If they are pointing to the same place and the length is the same, the result is true. In this case, a1 and a2 are pointing to different memory blocks, so we have a result of false.

The gist of it is that == on an array is related to its elements, while is on an array is related to its metadata. Both can be used to compare an array to null: a == null will return true if a.length is 0; a is null will return true if a.length is 0 and a.ptr is null.

Strings

There are three string types in D: string, wstring, and dstring. They aren't actual types, but rather are each an alias to an immutable array of char, wchar, and dchar respectively. The D documentation says that strings are a special case of arrays. We can sum it up by saying that the compiler has some awareness of the string types, but they are not built-in types and they are, at heart, simply arrays. They are also treated specially by many functions in the standard library.

String essentials

Because strings are arrays, any properties available on arrays are also available on strings, but the .length property doesn't tell you how many letters are in a string; it tells you the number of Unicode code units. Remember from our discussion on basic types that each character type in D represents a single Unicode code unit. One or more code units can form a code point. In UTF-8, where each code unit is only eight bits, it's common for code points to be composed of multiple code units. A code point that requires two code units in UTF-8 could be represented as one code unit in UTF-16. The following example introduces D's string literals to demonstrate the code unit/code point dichotomy:

string s = "soufflé";
wstring ws = "soufflé"w;
dstring ds = "soufflé"d;

String literals in D are indicated by opening and closing double quotes (""). By default, they are understood by the compiler to be of type string. Appending a w to the end makes a wstring and a d similarly forces a dstring. The word soufflé has seven letters, but if you query s.length, you'll find it returns 8. Both ws.length and ds.length return 7 as expected. This discrepancy is because the letter é requires two code units in UTF-8. In both UTF-16 and UTF-32, a single code unit is large enough to hold it. In fact, a single code unit in UTF-32 is always equivalent to a single code point, as 32 bits is large enough to hold any Unicode character.

Note

Unicode is an important component in modern software development, yet is often misunderstood. For details, a great place to start is the Unicode FAQ at http://unicode.org/faq/. There are also a number of informative introductory articles that can be found through a quick web search.

Double-quoted string literals in D are always parsed for the standard escape sequences such as the end-of-line character ('\n'), the tab character ('\t'), and the null-terminator ('\0'). A single string literal declared over multiple lines will cause any newlines to be embedded in the string.

auto s1 = " Hi
I
  am a multi-line
    string";
writeln(s1);

Multiple strings declared in succession with no terminating semicolon between them are concatenated into a single string at compile time.

auto s2 = "I am" " a string which is"
          " composed of multiple strings"
          " on multiple lines.";

Many of the operations you'll commonly perform on strings are scattered throughout Phobos. Most of what you want is found in the std.string module. Other modules include std.ascii, std.uni, std.utf, std.format, std.path, std.regex, std.encoding, and std.windows.charset. Because strings are dynamic arrays, you can also use them with the functions in std.array. Since dynamic arrays are also ranges, many functions in std.algorithm also accept strings.

Another useful module for strings is std.conv. There you'll find a number of functions for converting from one type to another. Two particularly useful functions are to and parse. The former takes almost any type and converts it to almost any other. For example, given an int that you want to convert to a string, you can do the following:

import std.conv : to;
int x = 10;
auto s = to!string(x);

We've already seen the template instantiation operator, !,when we looked at std.conv.octal. Here, we're telling the template to take the runtime parameter x and turn it into the type indicated by the compile time parameter following the ! operator. We'll see more about the difference between runtime and compile-time parameters in Chapter 5, Generic Programming Made Easy.

Conversely, sometimes you have a string from which you want to extract another type. You can do this with to, but it will throw an exception if there are inconvertible characters in the string. An alternative that doesn't throw an exception is parse. When it encounters inconvertible characters, it will stop parsing and return whatever it has parsed so far:

import std.conv : to, parse;
int s1 = "10";
int x1 = to!int(s1);       // OK
int s2 = "10and20";
int x2 = parse!int(s2);    // OK: x2 = 10
int x3 = to!int(s2);       // ConvException

If you have one or more values that you want to insert into specific places in a string, you can use std.format.format. The std.format module includes several functions, such as format, that assist in creating formatted strings.

auto height = 193;
auto weight = 95;
auto fs = format("Bob is %s cm and weighs %s kg", height, weight);

The syntax and specifiers are the same used with writef. The format function always allocates a new string from the GC heap, but another version, sformat, allows you to pass a reusable buffer as a parameter and returns a slice to the buffer.

The empty string, "", has a length of zero. Its .ptr property, however, is not null. For compatibility with C, all string literals in D are null-terminated, so even the empty string points to a piece of memory containing a '\0'. Going with what we know about arrays, this means "" == null is true and "" is null is false.

Alternative string literals

In addition to the double-quoted string literals we've gone through, D supports the following string literals:

WYSIWYG strings

Any character in a WYSIWYG (What You See is What You Get) string is part of the string, including escape sequences. There are two WYSIWYG syntaxes, r"wysiwyg" and `wysiwyg` (these are backticks, not single quotes). Because the former uses double quotes and does not allow escape sequences, it's impossible to include double quotes in the string itself. The backtick syntax allows you to do that, but then you can't have backticks in the string.

writeln(r"I'm a WYSIWYG string'```'\t\n");
writeln(`me, too!\n\r"'''""`);

Delimited strings

Delimited strings allow you to choose how to denote the beginning and end of a string, but there are some special rules. These literals begin with q" and end with ". Any delimiter you choose must immediately follow the opening quote and immediately precede the ending quote. There are a few nesting delimiters that are recognized by default: [],(),<> and {}. Because these nest, you can use the same characters inside the string. Any nested delimiters must be balanced.

writeln(q"(A Delimited String (with nested parens))");
writeln(q"[An [Unbalanced nested delimiter]");
writeln(q"<Another unbalanced> nested delimiter>");
writeln(q"{And }{again!}");

In the first line, the nested parentheses are balanced and so become part of the string. The second line has an opening delimiter, but no closing delimiter to balance it. The opposite is true in the third line, while the last line has an unbalanced pair.

You can also use a custom identifier as the delimiter with the following guidelines:

The opening and closing identifier must be the same
A newline must immediately follow the opening identifier
A newline must immediately precede the closing identifier
The closing identifier must start at the beginning of the line

The first newline is not part of the string, but all subsequent newlines are, including the one preceding the closing identifier:

auto s = q"STR
I'm a string with a custom delimiter!
STR";

Strings delimited by identifiers are sometimes referred to as heredoc strings. Delimited strings are useful when long blocks of text need to be included in the source. They behave just like WYSIWYG strings, with the addition that they allow the use of both backticks and double quotes in the string.

Token strings

Token strings are string literals that must contain valid D tokens. These are great to use when generating code at compile time. Text editors that can perform syntax highlighting on D can highlight the code inside the string. Token strings open with q{ and close with }. Any newlines inside the literal are part of the string and the nesting of braces is allowed.

auto code = q{
int x = 10;
int y = 1;
};

Associative arrays

Associative arrays allow you to map keys of any type to values of any type.

int[string] aa1;  // int values, string keys
string[int] aa2;  // string values, int keys

The default initialization value for an associative array, if you print it, looks like the empty array, []. They may look the same, but they are two different beasts. Associative arrays have no .ptr or .capacity, though they do have a .length that is 0 by default. You can't call reserve, modify .length, or use the concatenation operator on an associative array. What you can do is add values like this:

aa1["Ten"] = 10;
aa2[10] = "Ten";

If the key already exists in the array, its existing value will be overwritten. If it doesn't exist, it will be added. Although aa2[10] looks like a regular array index operation, it is not. With an array, the indexes are sequential. If the index 10 does not exist, you've earned a range violation. With an associative array, you've added a new key and value pair. You can also initialize an associative array with literals, like this:

auto aa3 = ["x":22.0, "y":3.0f, "z":5.0f, "w":1.0];

Literals take the form of a bracketed sequence of comma-separated KeyType : ValueType pairs. In this particular declaration, the type of aa3 is inferred as double[string]. The type of double is inferred in the same way it would be with standard arrays; it's the common type of all of the values.

To remove an item from an associative array, use the remove function. This is not the same as std.algorithm.remove for arrays; no imports are required. If the key does not exist, it does nothing and returns false, otherwise it removes the key and its associated value and returns true.

aa3.remove("w");

There are two options for reading values from associative arrays. The most obvious is to read a key index directly.

auto x = aa1["x"];  // OK: key exists
auto w = aa1["w"];  // We removed it -- range violation

If you want to avoid the range violation for a nonexistent key, you can use the in operator instead. This takes a key as the left operand and an associative array as the right operand. If the key exists, it returns a pointer to the value. Otherwise, it returns null.

auto px = "x" in aa3;   // type double*, valid address
auto pw = "w" in aa3;   // type double*, null

Once a pointer is received from the in operator, it needs to be dereferenced to get at the value. We haven't looked at the if statements yet, but I assume you know what they are. So I'm going to show you a common D idiom. It's possible to combine the in operator with an if statement to perform an action on a value if a key is present.

if(auto px = "x" in aa3)
  writeln(*px);

You can fetch all of the keys and values in two ways. The efficient way is to call .byKey and .byValue. Both of these return ranges without allocating any heap memory or making any copies. We will explore ranges in understand how they can be efficient. Sometimes, you really do need to make a copy. For those situations, there are the .keys and .values properties. These each allocate a dynamic array and copy the keys and values respectively.