forked from hoanhan101/ultimate-go
-
Notifications
You must be signed in to change notification settings - Fork 0
/
array.go
195 lines (161 loc) · 8.71 KB
/
array.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
// ---------
// CPU CACHE
// ---------
// Cores DO NOT access main memory directly but their local caches.
// What store in caches are data and instruction.
// Cache speed from fastest to slowest: L1 -> L2 -> L3 -> main memory.
// Scott Meyers: "If performance matter then the total memeory you have is the total amount of
// caches" -> access to main memory is incredibly slow; practically speaking it might not even be there.
// How do we write code that can be sympathetic with the caching system to make sure that
// we don't have a cache miss or at least, we minimalize cache misses to our fullest potential?
// Processor has a Prefetcher. It predicts what data is needed ahead of time.
// There are different granularity depending on where we are on the machine.
// Our programming model uses a byte. We can read and write to a byte at a time. However, from the
// caching system POV, our granularity is not 1 byte. It is 64 bytes, called a cache line. All
// memory us junked up in this 64 bytes cache line.
// Since the caching mechanism is complex, Prefetcher tries to hide all the latency from us.
// It has to be able to pick up on predictable access pattern to data.
// -> We need to write code that create predictable access pattern to data
// One easy way is to create a contiguous allocation of memory and to iterate over them.
// The array data structure gives us ability to do so.
// From the hardware perspective, array is the most important data structure.
// From Go perspective, slice is. Array is the backing data structure for slice (like Vector in C++).
// Once we allocate an array, whatever it size, every element is equal distant from other element.
// As we iterate over that array, we begin to walk cache line by cache line. As the Prefetcher see
// that access pattern, it can pick it up and hide all the latency from us.
// For example, we have a big nxn matrix. We do LinkedList Traverse, Column Traverse, and Row Traverse
// and benchmark against them.
// Unsurprisingly, Row Traverse has the best performance. It walk through the matrix cache line
// by cache line and create a predictable access pattern.
// Column Traverse does not walk through the matrix cache line by cache line. It looks like random
// access memory pattern. That is why is the slowest among those.
// However, that doesn't explain why the LinkedList Traverse's performance is in the middle. We
// just think that it might perform as poorly as the Column Traverse.
// -> This leads us to another cache: TLB - Translation lookaside buffer. Its job is to maintain
// operating system page and offset to where physical memory is.
// ----------------------------
// Translation lookaside buffer
// ----------------------------
// Back to the different granularity, the caching system moves data in and out the hardware at 64
// bytes at a time. However, the operating system manages memory by paging its 4K (traditional page
// size for an operating system).
// TLB: For every page that we are managing, let's take our virtual memory addresses because that
// that we use (softwares run virtual addresses, its sandbox, that is how we use/share physical
// memory) and map it to the right page and offset for that physical memory.
// A miss on the TLB can be worse than just the cache miss alone.
// The LinkedList is somewhere in between is because the chance of multiple nodes being on the same
// page is probably pretty good. Even though we can get cache misses because cache lines aren't
// necessary in the distance that is predictable, we probably not have so many TLB cache misses.
// In the Column Traverse, not only we have cache misses, we probably have a TLB cache miss on
// every access as well.
// Data-oriented design matters.
// It is not enough to write the most efficient algorithm, how we access our data can have much
// more lasting effect on the performance than the algorithm itself.
package main
import "fmt"
func main() {
// -----------------------
// Declare and initialize
// -----------------------
// Declare an array of five strings that is initialized to its zero value.
// Recap: a string is a 2 word data structure: a pointer and a length
// Since this array is set to its zero value, every string in this array is also set to its
// zero value, which means that each string has the first word pointed to nil and
// second word is 0.
// -----------------------------
// | nil | nil | nil | nil | nil |
// -----------------------------
// | 0 | 0 | 0 | 0 | 0 |
// -----------------------------
var strings [5]string
// At index 0, a string now has a pointer to a backing array of bytes (characters in string)
// and its length is 5.
// -----------------
// What is the cost?
// -----------------
// The cost of this assignment is the cost of copying 2 bytes.
// We have two string values that have pointers to the same backing array of bytes.
// Therefore, the cost of this assignment is just 2 words.
// ----- -------------------
// | * | ---> | A | p | p | l | e | (1)
// ----- -------------------
// | 5 | A
// ----- |
// |
// |
// ---------------------
// |
// -----------------------------
// | * | nil | nil | nil | nil |
// -----------------------------
// | 5 | 0 | 0 | 0 | 0 |
// -----------------------------
strings[0] = "Apple"
strings[1] = "Orange"
strings[2] = "Banana"
strings[3] = "Grape"
strings[4] = "Plum"
// ---------------------------------
// Iterate over the array of strings
// ---------------------------------
// Using range, not only we can get the index but also a copy of the value in the array.
// fruit is now a string value; its scope is within the for statement.
// In the first iteration, we have the word "Apple". It is a string that has the first word
// also points to (1) and the second word is 5.
// So we now have 3 different string value all sharing the same backing array.
// What are we passing to the Println function?
// We are using value semantic here. We are not sharing our string value. Println is getting
// its own copy, its own string value. It means when we get to the Println call, there are now
// 4 string values all sharing the same backing array.
// We don't want to take an address of a string.
// We know the size of a string ahead of time.
// -> it has the ability to be on the stack
// -> not creating allocation
// -> not causing pressure on the GC
// -> the string has been designed to leverage value mechanic, to stay on the stack, out of the
// way of creating garbage.
// -> the only thing that has to be on the heap, if anything is the backing array, which is the
// one thing that being shared
fmt.Printf("\n=> Iterate over array\n")
for i, fruit := range strings {
fmt.Println(i, fruit)
}
// Declare an array of 4 integers that is initialized with some values using literal syntax.
numbers := [4]int{10, 20, 30, 40}
// Iterate over the array of numbers using traditional style.
fmt.Printf("\n=> Iterate over array using traditional style\n")
for i := 0; i < len(numbers); i++ {
fmt.Println(i, numbers[i])
}
// ---------------------
// Different type arrays
// ---------------------
// Declare an array of 5 integers that is initialized to its zero value.
var five [5]int
// Declare an array of 4 integers that is initialized with some values.
four := [4]int{10, 20, 30, 40}
fmt.Printf("\n=> Different type arrays\n")
fmt.Println(five)
fmt.Println(four)
// When we try to assign four to five like so five = four, the compiler says that
// "cannot use four (type [4]int) as type [5]int in assignment"
// This cannot happen because they have different types (size and representation).
// The size of an array makes up its type name: [4]int vs [5]int. Just like what we've seen
// with pointer. The * in *int is not an operator but part of the type name.
// Unsurprisingly, all array has known size at compiled time.
// -----------------------------
// Contiguous memory allocations
// -----------------------------
// Declare an array of 5 strings initialized with values.
six := [6]string{"Annie", "Betty", "Charley", "Doug", "Edward", "Hoanh"}
// Iterate over the array displaying the value and address of each element.
// By looking at the output of this Printf function, we can see that this array is truly a
// contiguous block of memory. We know a string is 2 word and depending on computer
// architecture, it will have x byte. The distance between two consecutive IndexAddr is exactly
// x byte.
// v is its own variable on the stack and it has the same address every single time.
fmt.Printf("\n=> Contiguous memory allocations\n")
for i, v := range six {
fmt.Printf("Value[%s]\tAddress[%p] IndexAddr[%p]\n", v, &v, &six[i])
}
}