tventi.8 - plan9port - [fork] Plan 9 from user space | |
git clone git://src.adamsgaard.dk/plan9port | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
tventi.8 (12887B) | |
--- | |
1 .TH VENTI 8 | |
2 .SH NAME | |
3 venti \- archival storage server | |
4 .SH SYNOPSIS | |
5 .in +0.25i | |
6 .ti -0.25i | |
7 .B venti/venti | |
8 [ | |
9 .B -Ldrs | |
10 ] | |
11 [ | |
12 .B -a | |
13 .I address | |
14 ] | |
15 [ | |
16 .B -B | |
17 .I blockcachesize | |
18 ] | |
19 [ | |
20 .B -c | |
21 .I config | |
22 ] | |
23 [ | |
24 .B -C | |
25 .I lumpcachesize | |
26 ] | |
27 [ | |
28 .B -h | |
29 .I httpaddress | |
30 ] | |
31 [ | |
32 .B -I | |
33 .I indexcachesize | |
34 ] | |
35 [ | |
36 .B -W | |
37 .I webroot | |
38 ] | |
39 .SH DESCRIPTION | |
40 .I Venti | |
41 is a SHA1-addressed archival storage server. | |
42 See | |
43 .MR venti (7) | |
44 for a full introduction to the system. | |
45 This page documents the structure and operation of the server. | |
46 .PP | |
47 A venti server requires multiple disks or disk partitions, | |
48 each of which must be properly formatted before the server | |
49 can be run. | |
50 .SS Disk | |
51 The venti server maintains three disk structures, typically | |
52 stored on raw disk partitions: | |
53 the append-only | |
54 .IR "data log" , | |
55 which holds, in sequential order, | |
56 the contents of every block written to the server; | |
57 the | |
58 .IR index , | |
59 which helps locate a block in the data log given its score; | |
60 and optionally the | |
61 .IR "bloom filter" , | |
62 a concise summary of which scores are present in the index. | |
63 The data log is the primary storage. | |
64 To improve the robustness, it should be stored on | |
65 a device that provides RAID functionality. | |
66 The index and the bloom filter are optimizations | |
67 employed to access the data log efficiently and can be rebuilt | |
68 if lost or damaged. | |
69 .PP | |
70 The data log is logically split into sections called | |
71 .IR arenas , | |
72 typically sized for easy offline backup | |
73 (e.g., 500MB). | |
74 A data log may comprise many disks, each storing | |
75 one or more arenas. | |
76 Such disks are called | |
77 .IR "arena partitions" . | |
78 Arena partitions are filled in the order given in the configuration. | |
79 .PP | |
80 The index is logically split into block-sized pieces called | |
81 .IR buckets , | |
82 each of which is responsible for a particular range of scores. | |
83 An index may be split across many disks, each storing many buckets. | |
84 Such disks are called | |
85 .IR "index sections" . | |
86 .PP | |
87 The index must be sized so that no bucket is full. | |
88 When a bucket fills, the server must be shut down and | |
89 the index made larger. | |
90 Since scores appear random, each bucket will contain | |
91 approximately the same number of entries. | |
92 Index entries are 40 bytes long. Assuming that a typical block | |
93 being written to the server is 8192 bytes and compresses to 4096 | |
94 bytes, the active index is expected to be about 1% of | |
95 the active data log. | |
96 Storing smaller blocks increases the relative index footprint; | |
97 storing larger blocks decreases it. | |
98 To allow variation in both block size and the random distribution | |
99 of scores to buckets, the suggested index size is 5% of | |
100 the active data log. | |
101 .PP | |
102 The (optional) bloom filter is a large bitmap that is stored on disk but | |
103 also kept completely in memory while the venti server runs. | |
104 It helps the venti server efficiently detect scores that are | |
105 .I not | |
106 already stored in the index. | |
107 The bloom filter starts out zeroed. | |
108 Each score recorded in the bloom filter is hashed to choose | |
109 .I nhash | |
110 bits to set in the bloom filter. | |
111 A score is definitely not stored in the index of any of its | |
112 .I nhash | |
113 bits are not set. | |
114 The bloom filter thus has two parameters: | |
115 .I nhash | |
116 (maximum 32) | |
117 and the total bitmap size | |
118 (maximum 512MB, 2\s-2\u32\d\s+2 bits). | |
119 .PP | |
120 The bloom filter should be sized so that | |
121 .I nhash | |
122 \(mu | |
123 .I nblock | |
124 \(<= | |
125 0.7 \(mu | |
126 .IR b , | |
127 where | |
128 .I nblock | |
129 is the expected number of blocks stored on the server | |
130 and | |
131 .I b | |
132 is the bitmap size in bits. | |
133 The false positive rate of the bloom filter when sized | |
134 this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2. | |
135 .I Nhash | |
136 less than 10 are not very useful; | |
137 .I nhash | |
138 greater than 24 are probably a waste of memory. | |
139 .I Fmtbloom | |
140 (see | |
141 .MR venti-fmt (8) ) | |
142 can be given either | |
143 .I nhash | |
144 or | |
145 .IR nblock ; | |
146 if given | |
147 .IR nblock , | |
148 it will derive an appropriate | |
149 .IR nhash . | |
150 .SS Memory | |
151 Venti can make effective use of large amounts of memory | |
152 for various caches. | |
153 .PP | |
154 The | |
155 .I "lump cache | |
156 holds recently-accessed venti data blocks, which the server refers to as | |
157 .IR lumps . | |
158 The lump cache should be at least 1MB but can profitably be much larger. | |
159 The lump cache can be thought of as the level-1 cache: | |
160 read requests handled by the lump cache can | |
161 be served instantly. | |
162 .PP | |
163 The | |
164 .I "block cache | |
165 holds recently-accessed | |
166 .I disk | |
167 blocks from the arena partitions. | |
168 The block cache needs to be able to simultaneously hold two blocks | |
169 from each arena plus four blocks for the currently-filling arena. | |
170 The block cache can be thought of as the level-2 cache: | |
171 read requests handled by the block cache are slower than those | |
172 handled by the lump cache, since the lump data must be extracted | |
173 from the raw disk blocks and possibly decompressed, but no | |
174 disk accesses are necessary. | |
175 .PP | |
176 The | |
177 .I "index cache | |
178 holds recently-accessed or prefetched | |
179 index entries. | |
180 The index cache needs to be able to hold index entries | |
181 for three or four arenas, at least, in order for prefetching | |
182 to work properly. Each index entry is 50 bytes. | |
183 Assuming 500MB arenas of | |
184 128,000 blocks that are 4096 bytes each after compression, | |
185 the minimum index cache size is about 6MB. | |
186 The index cache can be thought of as the level-3 cache: | |
187 read requests handled by the index cache must still go | |
188 to disk to fetch the arena blocks, but the costly random | |
189 access to the index is avoided. | |
190 .PP | |
191 The size of the index cache determines how long venti | |
192 can sustain its `burst' write throughput, during which time | |
193 the only disk accesses on the critical path | |
194 are sequential writes to the arena partitions. | |
195 For example, if you want to be able to sustain 10MB/s | |
196 for an hour, you need enough index cache to hold entries | |
197 for 36GB of blocks. Assuming 8192-byte blocks, | |
198 you need room for almost five million index entries. | |
199 Since index entries are 50 bytes each, you need 250MB | |
200 of index cache. | |
201 If the background index update process can make a single | |
202 pass through the index in an hour, which is possible, | |
203 then you can sustain the 10MB/s indefinitely (at least until | |
204 the arenas are all filled). | |
205 .PP | |
206 The | |
207 .I "bloom filter | |
208 requires memory equal to its size on disk, | |
209 as discussed above. | |
210 .PP | |
211 A reasonable starting allocation is to | |
212 divide memory equally (in thirds) between | |
213 the bloom filter, the index cache, and the lump and block caches; | |
214 the third of memory allocated to the lump and block caches | |
215 should be split unevenly, with more (say, two thirds) | |
216 going to the block cache. | |
217 .SS Network | |
218 The venti server announces two network services, one | |
219 (conventionally TCP port | |
220 .BR venti , | |
221 17034) serving | |
222 the venti protocol as described in | |
223 .MR venti (7) , | |
224 and one serving HTTP | |
225 (conventionally TCP port | |
226 .BR http , | |
227 80). | |
228 .PP | |
229 The venti web server provides the following | |
230 URLs for accessing status information: | |
231 .TF "\fL/storage" | |
232 .PD | |
233 .TP | |
234 .B /index | |
235 A summary of the usage of the arenas and index sections. | |
236 .TP | |
237 .B /xindex | |
238 An XML version of | |
239 .BR /index . | |
240 .TP | |
241 .B /storage | |
242 Brief storage totals. | |
243 .TP | |
244 .BI /set | |
245 Disable the values of all variables. | |
246 Variables are: | |
247 .BR compress , | |
248 whether or not to compress blocks | |
249 (for debugging); | |
250 .BR logging , | |
251 whether to write entries to the debugging logs; | |
252 .BR stats , | |
253 whether to collect run-time statistics; | |
254 .BR icachesleeptime , | |
255 the time in milliseconds between successive updates | |
256 of megabytes of the index cache; | |
257 .BR arenasumsleeptime , | |
258 the time in milliseconds between reads while | |
259 checksumming an arena in the background. | |
260 The two sleep times should be (but are not) managed by venti; | |
261 they exist to provide more experience with their effects. | |
262 The other variables exist only for debugging and | |
263 performance measurement. | |
264 .TP | |
265 .BI /set?name= variable | |
266 Show the current setting of | |
267 .IR variable . | |
268 .TP | |
269 .BI /set?name= variable &value= value | |
270 Set | |
271 .I variable | |
272 to | |
273 .IR value . | |
274 .TP | |
275 .BI /graph?arg= name [&arg2= name] &graph= type ¶m= value \fR... | |
276 A PNG image graphing the | |
277 .IT name | |
278 run-time statistic over time. | |
279 The details of names and parameters are mostly undocumented; | |
280 see the | |
281 .BR graphname | |
282 array in | |
283 .B httpd.c | |
284 in the venti code for a list of possible statistics. The | |
285 .IR type | |
286 of graph defaults to raw, see the | |
287 .BR xgraph | |
288 function for a list of types. Possible | |
289 .IR param | |
290 include the timeframe | |
291 .BR (t0 | |
292 and | |
293 .BR t1) | |
294 , the y limits | |
295 .BR (min | |
296 and | |
297 .BR max) | |
298 etc. | |
299 .TP | |
300 .B /log | |
301 A list of all debugging logs present in the server's memory. | |
302 .TP | |
303 .BI /log/ name | |
304 The contents of the debugging log with the given | |
305 .IR name . | |
306 .TP | |
307 .B /flushicache | |
308 Force venti to begin flushing the index cache to disk. | |
309 The request response will not be sent until the flush | |
310 has completed. | |
311 .TP | |
312 .B /flushdcache | |
313 Force venti to begin flushing the arena block cache to disk. | |
314 The request response will not be sent until the flush | |
315 has completed. | |
316 .PD | |
317 .PP | |
318 Requests for other files are served by consulting a | |
319 directory named in the configuration file | |
320 (see | |
321 .B webroot | |
322 below). | |
323 .SS Configuration File | |
324 A venti configuration file | |
325 enumerates the various index sections and | |
326 arenas that constitute a venti system. | |
327 The components are indicated by the name of the file, typically | |
328 a disk partition, in which they reside. The configuration | |
329 file is the only location that file names are used. Internally, | |
330 venti uses the names assigned when the components were formatted | |
331 with | |
332 .I fmtarenas | |
333 or | |
334 .I fmtisect | |
335 (see | |
336 .MR venti-fmt (8) ). | |
337 In particular, only the configuration needs to be | |
338 changed if a component is moved to a different file. | |
339 .PP | |
340 The configuration file consists of lines in the form described below. | |
341 Lines starting with | |
342 .B # | |
343 are comments. | |
344 .TF "\fLindex\fI name " | |
345 .PD | |
346 .TP | |
347 .BI index " name | |
348 Names the index for the system. | |
349 .TP | |
350 .BI arenas " file | |
351 .I File | |
352 is an arena partition, formatted using | |
353 .IR fmtarenas . | |
354 .TP | |
355 .BI isect " file | |
356 .I File | |
357 is an index section, formatted using | |
358 .IR fmtisect . | |
359 .TP | |
360 .BI bloom " file | |
361 .I File | |
362 is a bloom filter, formatted using | |
363 .IR fmtbloom . | |
364 .PD | |
365 .PP | |
366 After formatting a venti system using | |
367 .IR fmtindex , | |
368 the order of arenas and index sections should not be changed. | |
369 Additional arenas can be appended to the configuration; | |
370 run | |
371 .I fmtindex | |
372 with the | |
373 .B -a | |
374 flag to update the index. | |
375 .PP | |
376 The configuration file also holds configuration parameters | |
377 for the venti server itself. | |
378 These are: | |
379 .TF "\fLhttpaddr\fI netaddr " | |
380 .TP | |
381 .BI mem " size | |
382 lump cache size | |
383 .TP | |
384 .BI bcmem " size | |
385 block cache size | |
386 .TP | |
387 .BI icmem " size | |
388 index cache size | |
389 .TP | |
390 .BI addr " netaddr | |
391 network address to announce venti service | |
392 (default | |
393 .BR tcp!*!venti ) | |
394 .TP | |
395 .BI httpaddr " netaddr | |
396 network address to announce HTTP service | |
397 (default is not to start the service) | |
398 .TP | |
399 .B queuewrites | |
400 queue writes in memory | |
401 (default is not to queue) | |
402 .TP | |
403 .BI webroot " dir | |
404 directory tree containing files for | |
405 .IR venti 's | |
406 internal HTTP server to consult for unrecognized URLs | |
407 .PD | |
408 .PP | |
409 The units for the various cache sizes above can be specified by appendin… | |
410 .LR k , | |
411 .LR m , | |
412 or | |
413 .LR g | |
414 (case-insensitive) | |
415 to indicate kilobytes, megabytes, or gigabytes respectively. | |
416 .PP | |
417 The | |
418 .I file | |
419 name in the configuration lines above can be of the form | |
420 .IB file : lo - hi | |
421 to specify a range of the file. | |
422 .I Lo | |
423 and | |
424 .I hi | |
425 are specified in bytes but can have the usual | |
426 .BI k , | |
427 .BI m , | |
428 or | |
429 .B g | |
430 suffixes. | |
431 Either | |
432 .I lo | |
433 or | |
434 .I hi | |
435 may be omitted. | |
436 This notation eliminates the need to | |
437 partition raw disks on non-Plan 9 systems. | |
438 .SS Command Line | |
439 Many of the options to Venti duplicate parameters that | |
440 can be specified in the configuration file. | |
441 The command line options override those found in a | |
442 configuration file. | |
443 Additional options are: | |
444 .TF "\fL-c\fI config" | |
445 .PD | |
446 .TP | |
447 .BI -c " config | |
448 The server configuration file | |
449 (default | |
450 .BR venti.conf ) | |
451 .TP | |
452 .B -d | |
453 Produce various debugging information on standard error. | |
454 Implies | |
455 .BR -s . | |
456 .TP | |
457 .B -L | |
458 Enable logging. By default all logging is disabled. | |
459 Logging slows server operation considerably. | |
460 .TP | |
461 .B -r | |
462 Allow only read access to the venti data. | |
463 .TP | |
464 .B -s | |
465 Do not run in the background. | |
466 Normally, | |
467 the foreground process will exit once the Venti server | |
468 is initialized and ready for connections. | |
469 .PD | |
470 .SH EXAMPLE | |
471 A simple configuration: | |
472 .IP | |
473 .EX | |
474 % cat venti.conf | |
475 index main | |
476 isect /tmp/disks/isect0 | |
477 isect /tmp/disks/isect1 | |
478 arenas /tmp/disks/arenas | |
479 bloom /tmp/disks/bloom | |
480 mem 10M | |
481 bcmem 20M | |
482 icmem 30M | |
483 % | |
484 .EE | |
485 .PP | |
486 Format the index sections, the arena partition, | |
487 the bloom filter, and | |
488 finally the main index: | |
489 .IP | |
490 .EX | |
491 % venti/fmtisect isect0. /tmp/disks/isect0 | |
492 % venti/fmtisect isect1. /tmp/disks/isect1 | |
493 % venti/fmtarenas arenas0. /tmp/disks/arenas & | |
494 % venti/fmtbloom /tmp/disks/bloom & | |
495 % wait | |
496 % venti/fmtindex venti.conf | |
497 % | |
498 .EE | |
499 .PP | |
500 Start the server and check the storage statistics: | |
501 .IP | |
502 .EX | |
503 % venti/venti | |
504 % hget http://$sysname/storage | |
505 .EE | |
506 .SH SOURCE | |
507 .B \*9/src/cmd/venti/srv | |
508 .SH "SEE ALSO" | |
509 .MR venti (1) , | |
510 .MR venti (3) , | |
511 .MR venti (7) , | |
512 .MR venti-backup (8) | |
513 .MR venti-fmt (8) | |
514 .br | |
515 Sean Quinlan and Sean Dorward, | |
516 ``Venti: a new approach to archival storage'', | |
517 .I "Usenix Conference on File and Storage Technologies" , | |
518 2002. | |
519 .SH BUGS | |
520 Setting up a venti server is too complicated. | |
521 .PP | |
522 Venti should not require the user to decide how to | |
523 partition its memory usage. | |
524 .PP | |
525 Users of shells other than | |
526 .MR rc (1) | |
527 will not be able to use the program names shown. | |
528 One solution is to define | |
529 .B "V=$PLAN9/bin/venti" | |
530 and then substitute | |
531 .B $V/ | |
532 for | |
533 .B venti/ | |
534 in the paths above. |