
Derived data types
In this section, we're going to observe D's take on pointers, arrays, strings, and associative arrays. Much of what we'll cover here is very different from other C-family languages.
Pointers
As in other languages that support them, pointers in D are special variables intended to hold memory addresses. Take a moment to compile and run the following:
int* p; writeln("p's value is ", p); writeln("p's type is ", typeid(p)); writeln("p's size is ", p.sizeof);
First, look at the declaration. It should look very familiar to many C-family programmers. All pointer declarations are default initialized to null
, so here the first call to writeln
prints "null"
as the value. The type of p
printed in the second writeln
is int*
. The last line will print 4
in 32-bit and 8
in 64-bit.
So far so good. Now look at the following line and guess what type b
is:
int* a, b;
No, b
is not an int
, it is an int*
. The equivalent C or C++ code would look like this:
int *x, *y;
In D, x
would be interpreted as int*
and y
as int**
, causing a compiler error. Every symbol in a declaration must have the same type. No matter how many identifiers are in a pointer declaration, only one *
is needed and it applies to each of them. As such, it's considered best practice to put the *
next to the type, as in the first declaration, rather than next to the identifiers. Otherwise, pointers in D function much as they do elsewhere. The unary &
operator can be used to take the address of any variable and the *
operator can be used to dereference a pointer to fetch the value it's pointing at. Pointer types can be inferred like any other type. Changing the value to which a pointer points will be reflected when the pointer is next dereferenced.
auto num = 1; auto numPtr = # writefln("The value at address %s is %s", numPtr, *numPtr); num = 2; writefln("The value at address %s is %s", numPtr, *numPtr);
Here, the address of num
is assigned to numPtr
. Since num
is inferred to be int
, the type of numPtr
is inferred as int*
. Both calls to writeln
first print the value of numPtr
, which is the address of num
, then dereference numPtr
to print the value of num
. Memory addresses are printed as hexadecimal numbers by default. The following is the output:
The value at address 18FE34 is 1 The value at address 18FE34 is 2
void
pointers are used to represent pointers to any type, but it's rare to use them in D except when interfacing with C APIs. Dereferencing a void
pointer directly is an error; it must first be cast to the appropriate pointer type. Pointers to other types can be implicitly converted to void*
, though the reverse is not allowed.
auto num = 1; // int void* voidPtr = # // OK: int* converts to void* writeln(*voidPtr); // Error: void has no value writeln(*cast(int*)voidPtr); // OK: dereferencing int*.
All of the pointers we've seen so far point to values on the stack. Pointers can also point to blocks of heap memory. We can allocate heap memory using the new
expression (you'll learn how to allocate multiple values with new
when we take a look at arrays).
int* intPtr = new int; // Allocate memory for a single int *intPtr = 10;
The heap and the stack work as they do in C, except that D has a garbage collector involved. Memory allocated with new
is managed by the GC. Additionally, using certain language features can implicitly cause GC memory to be allocated. We'll discuss those cases when we come across them.
It's also possible to bypass the garbage collector completely and use alternative allocators such as C's malloc
.
import core.stdc.stdlib : malloc, free; int* intsPtr = cast(int*)malloc(int.sizeof * 10); // Ten ints free(intsPtr); auto dontDoThis = malloc(int.sizeof); auto thisIsOK = cast(int*)malloc(int.sizeof);
The variable dontDoThis
is inferred to be void*
, which usually isn't what you want. Always pay attention when using type inference. Another point of note is that allocating memory in this manner loses the benefit of default initialization. Any memory allocated through malloc
should be treated just as it would be in C. It's also worth noting here that D supports pointer arithmetic, which you could use to iterate intsPtr
. You can also use the array index operator, []
, to access elements of intsPtr
. Both approaches are frowned upon in D, however. It's much safer to convert intsPtr
to an array.
Arrays
Arrays in D are a popular feature, slices in particular. They aren't your grandpa's arrays, though. D does things a bit differently than elsewhere in the C family. We're going to spend a few pages digging into them so that you can avoid common beginner mistakes.
Array basics
The first thing to understand about arrays in D is that they are fat pointers; each array carries around both a length and a pointer to the memory block where its elements are stored. Conceptually, you can think of an array as a struct
that looks like this:
struct(T) { size_t length; T* ptr; }
T
is the type of the array elements. On every array, both .length
and .ptr
are accessible as properties.
Static arrays are allocated on the stack. They have a fixed length that does not change.
int[3] stat1; writeln(stat1);
Compile this snippet and the writeln
will print [0, 0, 0]
. Three int
values were allocated on the stack and have all been initialized to int.init
. A dynamic array can grow and shrink as required. The syntax of a dynamic array declaration looks like this:
int[] dynArray1;
Unlike stat1
, this array is empty. No space is allocated for any elements, only enough stack space to hold the metadata. The default initializer for a dynamic array is the empty array, []
. Its .length
will be 0
and its .ptr
will be null
. We can allocate space for the array elements using new
.
dynArray1 = new int[3]; int[] dynArray2 = new int[10];
Note
Some D users think the syntax auto arr = new int[3]
is too similar to the static array declaration auto arr = int[3]
. D now supports an alternative syntax, new int[](3)
. This new syntax is recommended, but old habits die hard. There is a large body of D code that uses the older syntax.
The first array will now have three int
values, the second will have ten, and all of the values will be default initialized to int.init
. Actually, the runtime will probably have allocated more room than necessary. You can see this with the .capacity
property.
writeln("#1:", dynArray1.capacity); writeln("#2:", dynArray2.capacity);
.capacity
returns the maximum length the array can grow to before reallocation is needed. The two writeln
calls above print 3
and 15
for me. This first number tells us that new int[3]
allocated exactly enough space for three int
values. If we append a new value to dynArray1
, a reallocation will take place. The second number tells us that new int[10]
allocated enough space for fifteen int
values. Since we only used ten, there's still space for five more elements to be appended before a reallocation is needed. The allocation algorithm is an implementation detail, so you can't rely on fifteen elements always being allocated when you request ten. What you can rely on is that enough space will be allocated for the number of elements you requested.
This default behavior is fine in many situations, but when you know you're going to be appending numerous items to an array, you can use the reserve
function to be more efficient.
int[] dynArray3; dynArray3.reserve(20); writefln("%s, %s", dynArray3.length, dynArray3.capacity);
We've asked the runtime to reserve enough space for twenty int
values, but none of that space is being used. This is an important difference between new
and reserve
. The former will allocate the space and return an array containing the number of elements you requested, that is, the new memory is not empty. The latter only allocates the space if the current capacity is smaller than the size requested, but the newly allocated space is empty. You can see this when writefln
prints 0, 31
to the screen. There are no elements in the array, but a total of 31 can be appended before a reallocation is needed.
This brings us to the append operator. Using this, you can append individual elements to an array. dynArray3
is empty, so let's give it some values.
dynArray3 ~= 2; dynArray3 ~= 10; writeln(dynArray3);
This will print [2, 10]
. Now let's combine dynArray3
with dynArray1
to create a new array. To do this, we can use the concatenation operator, ~
.
auto dynArray4 = dynArray3 ~ dynArray1; writeln(dynArray4);
Remember that dynArray1
contains three int
values that were initialized to 0
, so the writeln
in this snippet will print [2, 10, 0, 0, 0]
. Since both operands of the concatenation operator are of type int[]
, the type inferred for dynArray4
is int[]
. We can also add elements to a dynamic array by manipulating .length
directly.
dynArray1.length += 10;
If there is enough capacity to hold all ten values, no reallocation takes place. Otherwise, more space is allocated. If the current memory block cannot be extended, then a new block is allocated and the existing elements are copied to it. Finally, ten new elements are default initialized in the newly allocated space. Conversely, you can shrink the array by decreasing the length. Be aware that, when you do so, you're causing the capacity to be reset to 0
. This has a special significance that will be explained soon in this chapter.
To get at a specific value in any array, use the index operator, []
. Arrays use zero-based indexes. The special operator $
is a synonym for array.length
; it's only defined inside the brackets and always applies to the array being indexed.
writeln(dynArray4[0]); // Print the first element writeln(dynArray4[2]); // Print the third element writeln(dynArray4[$-1]); // Print the last element
The index operator works on both static and dynamic arrays. By default, D will do bounds checking at runtime to make sure that you don't read or write past either end of the array, a common source of exploits and other bugs in C and C++ software. Doing so will result in a runtime error reporting a range violation. You can turn bounds checking off by passing -boundscheck=off
to the compiler.
Rectangular arrays
A rectangular array (sometimes called a jagged array) is an array of arrays. As we'll soon see, new D programmers often find them confusing. The thing to keep in mind is that they are no different from normal arrays. Declaring them has the same form of elementType[numberOfElements]
that is used with any array. It's just that, in a rectangular array, the type of the array elements happens to be another array. Consider the following declaration of a normal array:
int[3] arr;
The arr
array is a static array of three int
elements, visually clarified by putting parentheses around the type:
(int)[3] arr;
Now look at the following declaration of a rectangular array:
int[3][2] ra1;
The ra1
array is a static array of two int[3]
elements. Again, putting parentheses around the type makes it clear.
(int[3])[2] ra1;
Fetching the element at any index in arr
, such as arr[0]
, returns an int
. In the same manner, ra1[0]
returns an int[3]
. We can, in turn, get at its first element with [0]
, which when implemented as a single statement looks like: ra1[0][0]
. I want to stress that none of this is special syntax; we have two index operators in the declaration solely because the type of arr
is itself an array type. Since ra1[0]
returns an array, then an the additional [0]
indexes into the returned array.
Now, about that confusion I mentioned. Many programmers are familiar with C's multidimensional arrays. There's a major difference in how they are declared in C and how rectangular arrays are declared in D. To help illustrate this, consider the following grid:

One way to describe this is as a grid of three rows and four columns. In C, this could be expressed in code like so:
int cgrid[3][4]; cgrid[1][0] = 10; // Set the first element of the second row
In D, we have to look at it a bit differently. In order to access the array elements the same way as in C, where [1][0]
is the second row and first column, we have to envision each row as an array of four elements. Given that the [4]
is part of the array type, the order of the indexes in the declaration will be the reverse of those in C.
int[4][3] dgrid; dgrid[1][0] = 10;
To be clear, the declaration is not creating a column-major array; it's still row-major exactly like the C array, so that [1][0]
is the second row and first column in both. The only difference is that the [4]
is part of the array type. Keep that in mind and you should have no trouble keeping things straight.
Here's another example of a rectangular array:
int[][3] ra2 = [ [0, 1], [2, 3, 4, 5], [6, 7, 8] ] writeln(ra2[0].length); writeln(ra2[1].length); writeln(ra2[2].length);
This is a static array of three int[]
s, where each element array has a different length. In a C multidimensional array, all of the elements are stored in a contiguous block of memory. In D, this is true when all parts of a rectangular array are static, such as int[3][3]
. Any dynamic component in a rectangular array can point to its own separate block of memory, in which case you can't rely on it being contiguous. It's possible to create a dynamic array of dynamic arrays: int[][]
. It's also possible to have more than two components, such as int[][][3]
.
Slices
When thinking about slices, it helps to consider that dynamic arrays are slices and slices are dynamic arrays.
auto tenArray = [5,10,15,20,25,30,35,40,45,50]; auto sliced = tenArray[0 .. 5];
Here, tenArray
is an array of int
s. It's initialized with an array literal, a feature we'll examine shortly. I've taken a slice from tenArray
and assigned it to a variable. The slice operator looks like this: [m .. n]
, where the first element of the slice is source[m]
and the last is source[n-1]
. So the first value of sliced
is tenArray[0]
and the last is tenArray[4]
. Pass it to writeln
and you'll see [5, 10, 15, 20]
. Print the length of sliced
and you'll see 5
, but the capacity may surprise you.
writeln(sliced.capacity);
This will print 0
. When a slice begins life, no new memory is allocated. Instead, it is backed by the source array. Continuing from the preceding snippet:
tenArray[0] = 10; writeln(sliced); writeln(tenArray);
Running this will show that tenArray[0]
and sliced[0]
are both set to 10
. The same thing works the other way; any changes made to sliced
will be reflected in tenArray
. To reinforce this point, add the following lines to the example:
writeln(sliced.ptr); writeln(tenArray.ptr);
Both pointers are pointing to the same memory block. Now, what do you think would happen if we were to append a new item to sliced
, either by increasing the .length
or through the ~=
operator? The answer lies in that .capacity
of 0
.
The zero capacity indicates that appending to this slice in place may overwrite the existing elements in memory, that is, those belonging to the original array. In order to avoid any potential overwrite, attempting to append will cause the relationship between the two arrays to be severed. A new memory block will be allocated, which is large enough to hold the existing elements plus the appended one, and all of the elements copied over. Then the .ptr
property of the slice will be set to the address of the new memory and its .capacity
to a non-zero value.
sliced ~= 55; writefln("Pointers: %s %s", tenArray.ptr, sliced.ptr); writefln("Caps: %s %s", tenArray.capacity, sliced.capacity);
Running this code will print two different memory addresses. sliced
is no longer backed by the memory of tenArray
and now has a capacity of 7
. We can say that sliced
has become its own array. Sometimes, this isn't the desired behavior. I mentioned earlier that decreasing the .length
of an array will reset its capacity to 0
. To demonstrate, here's a little slicing trick that has the same effect as decreasing the array length:
auto shrink = [10, 20, 30, 40, 50]; shrink = shrink[0 .. $-1]; writeln(dontShrink);
Four elements are sliced from shrink
and then the slice is assigned back to shrink
. This is the same as decreasing shrink.length
by one and also results in a zero capacity. Either way, the last element in the original array, the number 50
, still exists at the same location in memory. The reason .capacity
gives us a 0
here is that, if we were to append to shrink
, we would overwrite the 50
. If another slice is still pointing to the same memory block, it would be affected by any overwrites. To avoid any unintended consequences, D will play it safe and reallocate if we append.
Sometimes it doesn't matter if anything is overwritten. In that case, it's wasteful to reallocate each time the slice shrinks. That's where assumeSafeAppend
comes in.
assumeSafeAppend(shrink);
Calling this after decreasing the length will maintain the original capacity, allowing all appends to use the existing memory block. Decreasing the length again will also reset the capacity to 0
, requiring another call to assumeSafeAppend
if we want to continue reusing the same memory block.
It's possible to remove an element from the middle of an array by taking a slice from in front of it and another from behind it, then concatenating them together. As concatenation allocates a new array, this isn't the most efficient way to go about it. A much better alternative is a function from std.algorithm
called remove
. Let's say we want to remove the 30
from shrink
above. It's at the index 2
, so:
import std.algorithm : remove; shrink = shrink.remove(2);
Now shrink
contains the elements [10, 20, 40, 50]
. We'll look at the details of remove
in Chapter 7, Composing Functional Pipelines with Algorithms and Ranges.
Sometimes, you want to slice an entire array. There's a shortcut for that. Instead of slicing with [0..$]
, you can use empty brackets, or no brackets at all.
auto aSlice = anArray[]; auto anotherSlice = anArray;
It's possible for static and dynamic arrays to be implicitly converted both ways. When going from dynamic to static, the lengths must match exactly. When going from static to dynamic, the compiler achieves the conversion by taking a slice of the static array:
int[] dyn = [1,2,3]; int[3] stat1 = dyn; // OK: lengths match int[4] stat2 = dyn; // Error: mismatched array lengths int[] sliced1 = stat1; // OK: same as stat1[]
The memory for dyn
is allocated on the heap, but stat1
lives on the stack. When we initialize stat1
, the elements of dyn
are copied over and we now have two distinct arrays. In the last line, sliced1
is just like any other slice we've seen so far, no matter that it's a slice of a static array. Its .ptr
property will be identical to stat1.ptr
and it will have a capacity of 0
, so we can append to or expand it without worrying about any impact on stat1
. However, if stat1
goes out of scope while sliced1
still points to its memory, bad things can happen. If you can't guarantee that stat1
is going to stick around, you can use the .dup
property to copy it.
int[] sliced1 = stat1.dup;
This allocates memory for a dynamic array and copies into it all of the elements from stat1
. A similar property, .idup
, creates an immutable copy of an array. The details of immutable arrays will be discussed later in the chapter.
D arrays aren't the only things you can slice. Imagine that you've been given an array of integers, but as an int*
and an associated length variable rather than an int[]
. If you want to stay with your C roots, you can go ahead and use pointer arithmetic to your heart's content. If, on the other hand, you'd prefer the convenience of a D array, the language has got you covered: just slice the pointer. Assuming a C-style int*
array called parray
, the length of which is stored in a variable named len
:
int[] array = parray[0 .. len];
How convenient is that? Be careful, though. As when slicing an array, the slice here is backed by the original pointer. In fact, array.ptr
is exactly the same address as parray
. This comes with the same potential consequences of slicing a static array. If parray
is freed behind your back, or otherwise becomes an invalid memory location, array
isn't going to be valid anymore, so the slice of parray
should be .dup
ed.
auto array = parray[0 .. len].dup;
Array literals
Take a look at the following array declarations, all initialized with array literals:
auto arr1 = [1.0f,2.0f,3.0f]; // float[] auto arr2 = [1.0f,2.0,3.0]; // double[] auto arr3 = [1.0,2.0f,3.0f]; // double[]
This snippet demonstrates that array literals are inferred as dynamic arrays by default. It also shows how the base type of an array is inferred. We see that arr1
contains three float
s, so it is of type float[]
as one would reasonably expect. In the other arrays, we first see a float
followed by two double
s, then a double
followed by two float
s, yet both arrays are inferred as double[]
. The compiler looks at the type of each element and determines their common type. This type becomes the type of the array and all elements are implicitly converted. For example, given an array comprised of short
s and char
s, the common type is int
; in an array that contains one or more long
s and a mix of smaller integer types, the common type is long
.
We can use array literals with the append and concatenation operators.
int[] buildMe; buildMe ~= [1, 2, 3, 4] ~ 5;
Static arrays can also be initialized with array literals, as long as the lengths match.
int[3] arr4 = [1,2,3]; // OK int[3] arr5 = [1,2,3,4]; // Error: mismatched array lengths
Arrays and void
We've seen that void
can be used to turn off default initialization for a variable. This is true for static arrays as well. Normally, every element of a static array will be initialized at the point of declaration. If the array is in global scope, this isn't such a big deal, but if it's in a function that is frequently called, the time taken to initialize all of the array elements can be a performance drain if the array is large.
float[1024] lotsOfFloats = void;
This will allocate 1,024 float
values on the stack, but they will not be initialized to nan
. Normally, you shouldn't turn off default initialization unless profiling shows it helps.
Tip
Uninitialized dynamic arrays
Allocating a dynamic array with new
, such as new float[10]
, will always initialize the elements with their .init
value. Default initialization can be avoided in this case by allocating through std.array.uninitializedArray
instead of calling new
directly:
auto arr = uninitializedArray!(float[])(10)
It's also possible to declare arrays of type void[]
. Like the universal pointer, this is the universal array. A couple of use cases can be found in the Phobos module std.file
. The read
function there returns a void[]
array representing the bytes read from a file. The write
function accepts a void[]
buffer of bytes to write. You can cast from void[]
to any array type and all array types are implicitly convertible to void[]
.
Array operations
To close our array discussion, we're going to look at how arrays can serve as operands for many of the operators we discussed earlier. First up, let's take a look at a couple of special cases of the assignment operator.
We've seen .dup
always allocates a new array, which is wasteful when the goal is to copy elements from one array into an existing one. In an assignment expression, if both arrays are the same type, or the right operand is implicitly convertible to the left, we can add empty brackets to the left operand.
int[] a1 = new int[10]; int[] a2 = [0,1,2,3,4,5,6,7,8,9]; a1[] = a2;
We know that the first line allocates ten integers and initializes them all to 0
. We know that the array literal assigned to a2
allocates memory for ten integers and initializes them to the values in the bracket. We know that a1 = a2
would cause both arrays to share the same memory. By adding the empty index operator to the left operand, we're telling the compiler to do the equivalent of going through a2
and assigning each element to the corresponding position in a1
. In other words, a1[0] = a2[0]
, a1[1] = a2[1]
, and so on. Although the end result looks the same as a1 = a2.dup
, there is a major difference. Calling .dup
will cause a1.ptr
to point to the memory allocated by .dup
; with the bracketed assignment, a1.ptr
will be unchanged. The first two calls to writeln
in this snippet will print the same address and the last one will print something different.
int[] a1 = new int[10]; writeln(a1.ptr); int[] a2 = [0,1,2,3,4,5,6,7,8,9]; a1[] = a2; writeln(a1.ptr); a1 = a2.dup; writeln(a1.ptr);
Two big caveats here. First, even when the target of the assignment is a dynamic array, the lengths of both arrays must exactly match. The second caveat is that the memory of the two arrays cannot overlap. Consider the following:
int[] a4 = [1,2,3,4,5]; int[] a5 = a4; a5[] = a4;
This will give you a runtime error complaining about overlapping memory. This is an obvious case, as we know that the assignment of a4
to a5
will result in both arrays pointing to the same location. Where it isn't so obvious is with slices from pointers, or allocating your own array memory outside the GC via malloc
. Vigilance is a virtue.
The empty index operator also allows us to assign a single value to every index in an array. Consider the following:
int[10] sa1 = 10; sa1[] = 100;
Here, every element of sa1
is initialized to 10
. In the next line, all ten elements are assigned the value 100
. This also shows that the empty index operator works with static arrays equally as well as dynamic arrays. We can apply several other operators to arrays. For example:
int[] a = [2,3,4]; a[] ^^= 2; writeln(a); int[] b = [5,6,7]; int[3] c = a[] + b[]; writeln(c);
If you try something like writeln(a[] + b[])
you'll get a compiler error telling you that such operations on arrays require destination memory to write the result. There's no such thing as an implicit temporary for this sort of thing. I encourage you to experiment with the basic operators from earlier in this chapter to see what works with arrays and what doesn't. For example, the shift operators do not accept arrays as operands.
Finally, let's talk about array equality. This is our first opportunity to see the difference between ==
and is
with reference types. Examine the following snippet:
auto ea1 = [1,2,3]; auto ea2 = [1,2,3]; writeln(ea1 == a2); writeln(ea1 is a2);
Here we have two dynamic arrays with identical elements. The first writeln
contains an equality expression. This will do an element-by-element comparison to see if each is the same value and evaluate to true
if so. The is
operator in the second writeln
tests if the two arrays have the same identity. It doesn't care about the elements. So what's our snippet going to print?
Given that both a1
and a2
have the same number of elements and each element has the same value, it's rather obvious that the first writeln
will print true
. The is
operator in the second writeln
is going to look at the pointer and length of each array. If they are pointing to the same place and the length is the same, the result is true
. In this case, a1
and a2
are pointing to different memory blocks, so we have a result of false
.
The gist of it is that ==
on an array is related to its elements, while is
on an array is related to its metadata. Both can be used to compare an array to null
: a == null
will return true if a.length
is 0
; a is null
will return true if a.length
is 0
and a.ptr
is null
.
Strings
There are three string types in D: string
, wstring
, and dstring
. They aren't actual types, but rather are each an alias to an immutable array of char
, wchar
, and dchar
respectively. The D documentation says that strings are a special case of arrays. We can sum it up by saying that the compiler has some awareness of the string types, but they are not built-in types and they are, at heart, simply arrays. They are also treated specially by many functions in the standard library.
String essentials
Because strings are arrays, any properties available on arrays are also available on strings, but the .length
property doesn't tell you how many letters are in a string; it tells you the number of Unicode code units. Remember from our discussion on basic types that each character type in D represents a single Unicode code unit. One or more code units can form a code point. In UTF-8, where each code unit is only eight bits, it's common for code points to be composed of multiple code units. A code point that requires two code units in UTF-8 could be represented as one code unit in UTF-16. The following example introduces D's string literals to demonstrate the code unit/code point dichotomy:
string s = "soufflé"; wstring ws = "soufflé"w; dstring ds = "soufflé"d;
String literals in D are indicated by opening and closing double quotes (""
). By default, they are understood by the compiler to be of type string
. Appending a w
to the end makes a wstring
and a d
similarly forces a dstring
. The word soufflé
has seven letters, but if you query s.length
, you'll find it returns 8
. Both ws.length
and ds.length
return 7
as expected. This discrepancy is because the letter é
requires two code units in UTF-8. In both UTF-16 and UTF-32, a single code unit is large enough to hold it. In fact, a single code unit in UTF-32 is always equivalent to a single code point, as 32 bits is large enough to hold any Unicode character.
Note
Unicode is an important component in modern software development, yet is often misunderstood. For details, a great place to start is the Unicode FAQ at http://unicode.org/faq/. There are also a number of informative introductory articles that can be found through a quick web search.
Double-quoted string literals in D are always parsed for the standard escape sequences such as the end-of-line character ('\n'
), the tab character ('\t'
), and the null-terminator ('\0'
). A single string literal declared over multiple lines will cause any newlines to be embedded in the string.
auto s1 = " Hi I am a multi-line string"; writeln(s1);
Multiple strings declared in succession with no terminating semicolon between them are concatenated into a single string at compile time.
auto s2 = "I am" " a string which is" " composed of multiple strings" " on multiple lines.";
Many of the operations you'll commonly perform on strings are scattered throughout Phobos. Most of what you want is found in the std.string
module. Other modules include std.ascii
, std.uni
, std.utf
, std.format
, std.path
, std.regex
, std.encoding
, and std.windows.charset
. Because strings are dynamic arrays, you can also use them with the functions in std.array
. Since dynamic arrays are also ranges, many functions in std.algorithm
also accept strings.
Another useful module for strings is std.conv
. There you'll find a number of functions for converting from one type to another. Two particularly useful functions are to
and parse
. The former takes almost any type and converts it to almost any other. For example, given an int
that you want to convert to a string
, you can do the following:
import std.conv : to; int x = 10; auto s = to!string(x);
We've already seen the template instantiation operator, !
,when we looked at std.conv.octal
. Here, we're telling the template to take the runtime parameter x
and turn it into the type indicated by the compile time parameter following the !
operator. We'll see more about the difference between runtime and compile-time parameters in Chapter 5, Generic Programming Made Easy.
Conversely, sometimes you have a string from which you want to extract another type. You can do this with to
, but it will throw an exception if there are inconvertible characters in the string. An alternative that doesn't throw an exception is parse
. When it encounters inconvertible characters, it will stop parsing and return whatever it has parsed so far:
import std.conv : to, parse; int s1 = "10"; int x1 = to!int(s1); // OK int s2 = "10and20"; int x2 = parse!int(s2); // OK: x2 = 10 int x3 = to!int(s2); // ConvException
If you have one or more values that you want to insert into specific places in a string, you can use std.format.format
. The std.format
module includes several functions, such as format
, that assist in creating formatted strings.
auto height = 193; auto weight = 95; auto fs = format("Bob is %s cm and weighs %s kg", height, weight);
The syntax and specifiers are the same used with writef
. The format
function always allocates a new string from the GC heap, but another version, sformat
, allows you to pass a reusable buffer as a parameter and returns a slice to the buffer.
The empty string, ""
, has a length of zero. Its .ptr
property, however, is not null
. For compatibility with C, all string literals in D are null-terminated, so even the empty string points to a piece of memory containing a '\0'
. Going with what we know about arrays, this means "" == null
is true
and ""
is null
is false
.
Alternative string literals
In addition to the double-quoted string literals we've gone through, D supports the following string literals:
WYSIWYG strings
Any character in a WYSIWYG (What You See is What You Get) string is part of the string, including escape sequences. There are two WYSIWYG syntaxes, r"wysiwyg"
and `wysiwyg`
(these are backticks, not single quotes). Because the former uses double quotes and does not allow escape sequences, it's impossible to include double quotes in the string itself. The backtick syntax allows you to do that, but then you can't have backticks in the string.
writeln(r"I'm a WYSIWYG string'```'\t\n"); writeln(`me, too!\n\r"'''""`);
Delimited strings
Delimited strings allow you to choose how to denote the beginning and end of a string, but there are some special rules. These literals begin with q"
and end with "
. Any delimiter you choose must immediately follow the opening quote and immediately precede the ending quote. There are a few nesting delimiters that are recognized by default: []
,()
,<>
and {}
. Because these nest, you can use the same characters inside the string. Any nested delimiters must be balanced.
writeln(q"(A Delimited String (with nested parens))"); writeln(q"[An [Unbalanced nested delimiter]"); writeln(q"<Another unbalanced> nested delimiter>"); writeln(q"{And }{again!}");
In the first line, the nested parentheses are balanced and so become part of the string. The second line has an opening delimiter, but no closing delimiter to balance it. The opposite is true in the third line, while the last line has an unbalanced pair.
You can also use a custom identifier as the delimiter with the following guidelines:
- The opening and closing identifier must be the same
- A newline must immediately follow the opening identifier
- A newline must immediately precede the closing identifier
- The closing identifier must start at the beginning of the line
The first newline is not part of the string, but all subsequent newlines are, including the one preceding the closing identifier:
auto s = q"STR I'm a string with a custom delimiter! STR";
Strings delimited by identifiers are sometimes referred to as heredoc strings. Delimited strings are useful when long blocks of text need to be included in the source. They behave just like WYSIWYG strings, with the addition that they allow the use of both backticks and double quotes in the string.
Token strings
Token strings are string literals that must contain valid D tokens. These are great to use when generating code at compile time. Text editors that can perform syntax highlighting on D can highlight the code inside the string. Token strings open with q{
and close with }
. Any newlines inside the literal are part of the string and the nesting of braces is allowed.
auto code = q{ int x = 10; int y = 1; };
Associative arrays
Associative arrays allow you to map keys of any type to values of any type.
int[string] aa1; // int values, string keys string[int] aa2; // string values, int keys
The default initialization value for an associative array, if you print it, looks like the empty array, []
. They may look the same, but they are two different beasts. Associative arrays have no .ptr
or .capacity
, though they do have a .length
that is 0
by default. You can't call reserve
, modify .length
, or use the concatenation operator on an associative array. What you can do is add values like this:
aa1["Ten"] = 10; aa2[10] = "Ten";
If the key already exists in the array, its existing value will be overwritten. If it doesn't exist, it will be added. Although aa2[10]
looks like a regular array index operation, it is not. With an array, the indexes are sequential. If the index 10
does not exist, you've earned a range violation. With an associative array, you've added a new key and value pair. You can also initialize an associative array with literals, like this:
auto aa3 = ["x":22.0, "y":3.0f, "z":5.0f, "w":1.0];
Literals take the form of a bracketed sequence of comma-separated KeyType : ValueType
pairs. In this particular declaration, the type of aa3
is inferred as double[string]
. The type of double
is inferred in the same way it would be with standard arrays; it's the common type of all of the values.
To remove an item from an associative array, use the remove
function. This is not the same as std.algorithm.remove
for arrays; no imports are required. If the key does not exist, it does nothing and returns false
, otherwise it removes the key and its associated value and returns true
.
aa3.remove("w");
There are two options for reading values from associative arrays. The most obvious is to read a key index directly.
auto x = aa1["x"]; // OK: key exists auto w = aa1["w"]; // We removed it -- range violation
If you want to avoid the range violation for a nonexistent key, you can use the in
operator instead. This takes a key as the left operand and an associative array as the right operand. If the key exists, it returns a pointer to the value. Otherwise, it returns null
.
auto px = "x" in aa3; // type double*, valid address auto pw = "w" in aa3; // type double*, null
Once a pointer is received from the in
operator, it needs to be dereferenced to get at the value. We haven't looked at the if
statements yet, but I assume you know what they are. So I'm going to show you a common D idiom. It's possible to combine the in
operator with an if
statement to perform an action on a value if a key is present.
if(auto px = "x" in aa3) writeln(*px);
You can fetch all of the keys and values in two ways. The efficient way is to call .byKey
and .byValue
. Both of these return ranges without allocating any heap memory or making any copies. We will explore ranges in understand how they can be efficient. Sometimes, you really do need to make a copy. For those situations, there are the .keys
and .values
properties. These each allocate a dynamic array and copy the keys and values respectively.