| tventi.8 - plan9port - [fork] Plan 9 from user space | |
| git clone git://src.adamsgaard.dk/plan9port | |
| Log | |
| Files | |
| Refs | |
| README | |
| LICENSE | |
| --- | |
| tventi.8 (12887B) | |
| --- | |
| 1 .TH VENTI 8 | |
| 2 .SH NAME | |
| 3 venti \- archival storage server | |
| 4 .SH SYNOPSIS | |
| 5 .in +0.25i | |
| 6 .ti -0.25i | |
| 7 .B venti/venti | |
| 8 [ | |
| 9 .B -Ldrs | |
| 10 ] | |
| 11 [ | |
| 12 .B -a | |
| 13 .I address | |
| 14 ] | |
| 15 [ | |
| 16 .B -B | |
| 17 .I blockcachesize | |
| 18 ] | |
| 19 [ | |
| 20 .B -c | |
| 21 .I config | |
| 22 ] | |
| 23 [ | |
| 24 .B -C | |
| 25 .I lumpcachesize | |
| 26 ] | |
| 27 [ | |
| 28 .B -h | |
| 29 .I httpaddress | |
| 30 ] | |
| 31 [ | |
| 32 .B -I | |
| 33 .I indexcachesize | |
| 34 ] | |
| 35 [ | |
| 36 .B -W | |
| 37 .I webroot | |
| 38 ] | |
| 39 .SH DESCRIPTION | |
| 40 .I Venti | |
| 41 is a SHA1-addressed archival storage server. | |
| 42 See | |
| 43 .MR venti (7) | |
| 44 for a full introduction to the system. | |
| 45 This page documents the structure and operation of the server. | |
| 46 .PP | |
| 47 A venti server requires multiple disks or disk partitions, | |
| 48 each of which must be properly formatted before the server | |
| 49 can be run. | |
| 50 .SS Disk | |
| 51 The venti server maintains three disk structures, typically | |
| 52 stored on raw disk partitions: | |
| 53 the append-only | |
| 54 .IR "data log" , | |
| 55 which holds, in sequential order, | |
| 56 the contents of every block written to the server; | |
| 57 the | |
| 58 .IR index , | |
| 59 which helps locate a block in the data log given its score; | |
| 60 and optionally the | |
| 61 .IR "bloom filter" , | |
| 62 a concise summary of which scores are present in the index. | |
| 63 The data log is the primary storage. | |
| 64 To improve the robustness, it should be stored on | |
| 65 a device that provides RAID functionality. | |
| 66 The index and the bloom filter are optimizations | |
| 67 employed to access the data log efficiently and can be rebuilt | |
| 68 if lost or damaged. | |
| 69 .PP | |
| 70 The data log is logically split into sections called | |
| 71 .IR arenas , | |
| 72 typically sized for easy offline backup | |
| 73 (e.g., 500MB). | |
| 74 A data log may comprise many disks, each storing | |
| 75 one or more arenas. | |
| 76 Such disks are called | |
| 77 .IR "arena partitions" . | |
| 78 Arena partitions are filled in the order given in the configuration. | |
| 79 .PP | |
| 80 The index is logically split into block-sized pieces called | |
| 81 .IR buckets , | |
| 82 each of which is responsible for a particular range of scores. | |
| 83 An index may be split across many disks, each storing many buckets. | |
| 84 Such disks are called | |
| 85 .IR "index sections" . | |
| 86 .PP | |
| 87 The index must be sized so that no bucket is full. | |
| 88 When a bucket fills, the server must be shut down and | |
| 89 the index made larger. | |
| 90 Since scores appear random, each bucket will contain | |
| 91 approximately the same number of entries. | |
| 92 Index entries are 40 bytes long. Assuming that a typical block | |
| 93 being written to the server is 8192 bytes and compresses to 4096 | |
| 94 bytes, the active index is expected to be about 1% of | |
| 95 the active data log. | |
| 96 Storing smaller blocks increases the relative index footprint; | |
| 97 storing larger blocks decreases it. | |
| 98 To allow variation in both block size and the random distribution | |
| 99 of scores to buckets, the suggested index size is 5% of | |
| 100 the active data log. | |
| 101 .PP | |
| 102 The (optional) bloom filter is a large bitmap that is stored on disk but | |
| 103 also kept completely in memory while the venti server runs. | |
| 104 It helps the venti server efficiently detect scores that are | |
| 105 .I not | |
| 106 already stored in the index. | |
| 107 The bloom filter starts out zeroed. | |
| 108 Each score recorded in the bloom filter is hashed to choose | |
| 109 .I nhash | |
| 110 bits to set in the bloom filter. | |
| 111 A score is definitely not stored in the index of any of its | |
| 112 .I nhash | |
| 113 bits are not set. | |
| 114 The bloom filter thus has two parameters: | |
| 115 .I nhash | |
| 116 (maximum 32) | |
| 117 and the total bitmap size | |
| 118 (maximum 512MB, 2\s-2\u32\d\s+2 bits). | |
| 119 .PP | |
| 120 The bloom filter should be sized so that | |
| 121 .I nhash | |
| 122 \(mu | |
| 123 .I nblock | |
| 124 \(<= | |
| 125 0.7 \(mu | |
| 126 .IR b , | |
| 127 where | |
| 128 .I nblock | |
| 129 is the expected number of blocks stored on the server | |
| 130 and | |
| 131 .I b | |
| 132 is the bitmap size in bits. | |
| 133 The false positive rate of the bloom filter when sized | |
| 134 this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2. | |
| 135 .I Nhash | |
| 136 less than 10 are not very useful; | |
| 137 .I nhash | |
| 138 greater than 24 are probably a waste of memory. | |
| 139 .I Fmtbloom | |
| 140 (see | |
| 141 .MR venti-fmt (8) ) | |
| 142 can be given either | |
| 143 .I nhash | |
| 144 or | |
| 145 .IR nblock ; | |
| 146 if given | |
| 147 .IR nblock , | |
| 148 it will derive an appropriate | |
| 149 .IR nhash . | |
| 150 .SS Memory | |
| 151 Venti can make effective use of large amounts of memory | |
| 152 for various caches. | |
| 153 .PP | |
| 154 The | |
| 155 .I "lump cache | |
| 156 holds recently-accessed venti data blocks, which the server refers to as | |
| 157 .IR lumps . | |
| 158 The lump cache should be at least 1MB but can profitably be much larger. | |
| 159 The lump cache can be thought of as the level-1 cache: | |
| 160 read requests handled by the lump cache can | |
| 161 be served instantly. | |
| 162 .PP | |
| 163 The | |
| 164 .I "block cache | |
| 165 holds recently-accessed | |
| 166 .I disk | |
| 167 blocks from the arena partitions. | |
| 168 The block cache needs to be able to simultaneously hold two blocks | |
| 169 from each arena plus four blocks for the currently-filling arena. | |
| 170 The block cache can be thought of as the level-2 cache: | |
| 171 read requests handled by the block cache are slower than those | |
| 172 handled by the lump cache, since the lump data must be extracted | |
| 173 from the raw disk blocks and possibly decompressed, but no | |
| 174 disk accesses are necessary. | |
| 175 .PP | |
| 176 The | |
| 177 .I "index cache | |
| 178 holds recently-accessed or prefetched | |
| 179 index entries. | |
| 180 The index cache needs to be able to hold index entries | |
| 181 for three or four arenas, at least, in order for prefetching | |
| 182 to work properly. Each index entry is 50 bytes. | |
| 183 Assuming 500MB arenas of | |
| 184 128,000 blocks that are 4096 bytes each after compression, | |
| 185 the minimum index cache size is about 6MB. | |
| 186 The index cache can be thought of as the level-3 cache: | |
| 187 read requests handled by the index cache must still go | |
| 188 to disk to fetch the arena blocks, but the costly random | |
| 189 access to the index is avoided. | |
| 190 .PP | |
| 191 The size of the index cache determines how long venti | |
| 192 can sustain its `burst' write throughput, during which time | |
| 193 the only disk accesses on the critical path | |
| 194 are sequential writes to the arena partitions. | |
| 195 For example, if you want to be able to sustain 10MB/s | |
| 196 for an hour, you need enough index cache to hold entries | |
| 197 for 36GB of blocks. Assuming 8192-byte blocks, | |
| 198 you need room for almost five million index entries. | |
| 199 Since index entries are 50 bytes each, you need 250MB | |
| 200 of index cache. | |
| 201 If the background index update process can make a single | |
| 202 pass through the index in an hour, which is possible, | |
| 203 then you can sustain the 10MB/s indefinitely (at least until | |
| 204 the arenas are all filled). | |
| 205 .PP | |
| 206 The | |
| 207 .I "bloom filter | |
| 208 requires memory equal to its size on disk, | |
| 209 as discussed above. | |
| 210 .PP | |
| 211 A reasonable starting allocation is to | |
| 212 divide memory equally (in thirds) between | |
| 213 the bloom filter, the index cache, and the lump and block caches; | |
| 214 the third of memory allocated to the lump and block caches | |
| 215 should be split unevenly, with more (say, two thirds) | |
| 216 going to the block cache. | |
| 217 .SS Network | |
| 218 The venti server announces two network services, one | |
| 219 (conventionally TCP port | |
| 220 .BR venti , | |
| 221 17034) serving | |
| 222 the venti protocol as described in | |
| 223 .MR venti (7) , | |
| 224 and one serving HTTP | |
| 225 (conventionally TCP port | |
| 226 .BR http , | |
| 227 80). | |
| 228 .PP | |
| 229 The venti web server provides the following | |
| 230 URLs for accessing status information: | |
| 231 .TF "\fL/storage" | |
| 232 .PD | |
| 233 .TP | |
| 234 .B /index | |
| 235 A summary of the usage of the arenas and index sections. | |
| 236 .TP | |
| 237 .B /xindex | |
| 238 An XML version of | |
| 239 .BR /index . | |
| 240 .TP | |
| 241 .B /storage | |
| 242 Brief storage totals. | |
| 243 .TP | |
| 244 .BI /set | |
| 245 Disable the values of all variables. | |
| 246 Variables are: | |
| 247 .BR compress , | |
| 248 whether or not to compress blocks | |
| 249 (for debugging); | |
| 250 .BR logging , | |
| 251 whether to write entries to the debugging logs; | |
| 252 .BR stats , | |
| 253 whether to collect run-time statistics; | |
| 254 .BR icachesleeptime , | |
| 255 the time in milliseconds between successive updates | |
| 256 of megabytes of the index cache; | |
| 257 .BR arenasumsleeptime , | |
| 258 the time in milliseconds between reads while | |
| 259 checksumming an arena in the background. | |
| 260 The two sleep times should be (but are not) managed by venti; | |
| 261 they exist to provide more experience with their effects. | |
| 262 The other variables exist only for debugging and | |
| 263 performance measurement. | |
| 264 .TP | |
| 265 .BI /set?name= variable | |
| 266 Show the current setting of | |
| 267 .IR variable . | |
| 268 .TP | |
| 269 .BI /set?name= variable &value= value | |
| 270 Set | |
| 271 .I variable | |
| 272 to | |
| 273 .IR value . | |
| 274 .TP | |
| 275 .BI /graph?arg= name [&arg2= name] &graph= type ¶m= value \fR... | |
| 276 A PNG image graphing the | |
| 277 .IT name | |
| 278 run-time statistic over time. | |
| 279 The details of names and parameters are mostly undocumented; | |
| 280 see the | |
| 281 .BR graphname | |
| 282 array in | |
| 283 .B httpd.c | |
| 284 in the venti code for a list of possible statistics. The | |
| 285 .IR type | |
| 286 of graph defaults to raw, see the | |
| 287 .BR xgraph | |
| 288 function for a list of types. Possible | |
| 289 .IR param | |
| 290 include the timeframe | |
| 291 .BR (t0 | |
| 292 and | |
| 293 .BR t1) | |
| 294 , the y limits | |
| 295 .BR (min | |
| 296 and | |
| 297 .BR max) | |
| 298 etc. | |
| 299 .TP | |
| 300 .B /log | |
| 301 A list of all debugging logs present in the server's memory. | |
| 302 .TP | |
| 303 .BI /log/ name | |
| 304 The contents of the debugging log with the given | |
| 305 .IR name . | |
| 306 .TP | |
| 307 .B /flushicache | |
| 308 Force venti to begin flushing the index cache to disk. | |
| 309 The request response will not be sent until the flush | |
| 310 has completed. | |
| 311 .TP | |
| 312 .B /flushdcache | |
| 313 Force venti to begin flushing the arena block cache to disk. | |
| 314 The request response will not be sent until the flush | |
| 315 has completed. | |
| 316 .PD | |
| 317 .PP | |
| 318 Requests for other files are served by consulting a | |
| 319 directory named in the configuration file | |
| 320 (see | |
| 321 .B webroot | |
| 322 below). | |
| 323 .SS Configuration File | |
| 324 A venti configuration file | |
| 325 enumerates the various index sections and | |
| 326 arenas that constitute a venti system. | |
| 327 The components are indicated by the name of the file, typically | |
| 328 a disk partition, in which they reside. The configuration | |
| 329 file is the only location that file names are used. Internally, | |
| 330 venti uses the names assigned when the components were formatted | |
| 331 with | |
| 332 .I fmtarenas | |
| 333 or | |
| 334 .I fmtisect | |
| 335 (see | |
| 336 .MR venti-fmt (8) ). | |
| 337 In particular, only the configuration needs to be | |
| 338 changed if a component is moved to a different file. | |
| 339 .PP | |
| 340 The configuration file consists of lines in the form described below. | |
| 341 Lines starting with | |
| 342 .B # | |
| 343 are comments. | |
| 344 .TF "\fLindex\fI name " | |
| 345 .PD | |
| 346 .TP | |
| 347 .BI index " name | |
| 348 Names the index for the system. | |
| 349 .TP | |
| 350 .BI arenas " file | |
| 351 .I File | |
| 352 is an arena partition, formatted using | |
| 353 .IR fmtarenas . | |
| 354 .TP | |
| 355 .BI isect " file | |
| 356 .I File | |
| 357 is an index section, formatted using | |
| 358 .IR fmtisect . | |
| 359 .TP | |
| 360 .BI bloom " file | |
| 361 .I File | |
| 362 is a bloom filter, formatted using | |
| 363 .IR fmtbloom . | |
| 364 .PD | |
| 365 .PP | |
| 366 After formatting a venti system using | |
| 367 .IR fmtindex , | |
| 368 the order of arenas and index sections should not be changed. | |
| 369 Additional arenas can be appended to the configuration; | |
| 370 run | |
| 371 .I fmtindex | |
| 372 with the | |
| 373 .B -a | |
| 374 flag to update the index. | |
| 375 .PP | |
| 376 The configuration file also holds configuration parameters | |
| 377 for the venti server itself. | |
| 378 These are: | |
| 379 .TF "\fLhttpaddr\fI netaddr " | |
| 380 .TP | |
| 381 .BI mem " size | |
| 382 lump cache size | |
| 383 .TP | |
| 384 .BI bcmem " size | |
| 385 block cache size | |
| 386 .TP | |
| 387 .BI icmem " size | |
| 388 index cache size | |
| 389 .TP | |
| 390 .BI addr " netaddr | |
| 391 network address to announce venti service | |
| 392 (default | |
| 393 .BR tcp!*!venti ) | |
| 394 .TP | |
| 395 .BI httpaddr " netaddr | |
| 396 network address to announce HTTP service | |
| 397 (default is not to start the service) | |
| 398 .TP | |
| 399 .B queuewrites | |
| 400 queue writes in memory | |
| 401 (default is not to queue) | |
| 402 .TP | |
| 403 .BI webroot " dir | |
| 404 directory tree containing files for | |
| 405 .IR venti 's | |
| 406 internal HTTP server to consult for unrecognized URLs | |
| 407 .PD | |
| 408 .PP | |
| 409 The units for the various cache sizes above can be specified by appendin… | |
| 410 .LR k , | |
| 411 .LR m , | |
| 412 or | |
| 413 .LR g | |
| 414 (case-insensitive) | |
| 415 to indicate kilobytes, megabytes, or gigabytes respectively. | |
| 416 .PP | |
| 417 The | |
| 418 .I file | |
| 419 name in the configuration lines above can be of the form | |
| 420 .IB file : lo - hi | |
| 421 to specify a range of the file. | |
| 422 .I Lo | |
| 423 and | |
| 424 .I hi | |
| 425 are specified in bytes but can have the usual | |
| 426 .BI k , | |
| 427 .BI m , | |
| 428 or | |
| 429 .B g | |
| 430 suffixes. | |
| 431 Either | |
| 432 .I lo | |
| 433 or | |
| 434 .I hi | |
| 435 may be omitted. | |
| 436 This notation eliminates the need to | |
| 437 partition raw disks on non-Plan 9 systems. | |
| 438 .SS Command Line | |
| 439 Many of the options to Venti duplicate parameters that | |
| 440 can be specified in the configuration file. | |
| 441 The command line options override those found in a | |
| 442 configuration file. | |
| 443 Additional options are: | |
| 444 .TF "\fL-c\fI config" | |
| 445 .PD | |
| 446 .TP | |
| 447 .BI -c " config | |
| 448 The server configuration file | |
| 449 (default | |
| 450 .BR venti.conf ) | |
| 451 .TP | |
| 452 .B -d | |
| 453 Produce various debugging information on standard error. | |
| 454 Implies | |
| 455 .BR -s . | |
| 456 .TP | |
| 457 .B -L | |
| 458 Enable logging. By default all logging is disabled. | |
| 459 Logging slows server operation considerably. | |
| 460 .TP | |
| 461 .B -r | |
| 462 Allow only read access to the venti data. | |
| 463 .TP | |
| 464 .B -s | |
| 465 Do not run in the background. | |
| 466 Normally, | |
| 467 the foreground process will exit once the Venti server | |
| 468 is initialized and ready for connections. | |
| 469 .PD | |
| 470 .SH EXAMPLE | |
| 471 A simple configuration: | |
| 472 .IP | |
| 473 .EX | |
| 474 % cat venti.conf | |
| 475 index main | |
| 476 isect /tmp/disks/isect0 | |
| 477 isect /tmp/disks/isect1 | |
| 478 arenas /tmp/disks/arenas | |
| 479 bloom /tmp/disks/bloom | |
| 480 mem 10M | |
| 481 bcmem 20M | |
| 482 icmem 30M | |
| 483 % | |
| 484 .EE | |
| 485 .PP | |
| 486 Format the index sections, the arena partition, | |
| 487 the bloom filter, and | |
| 488 finally the main index: | |
| 489 .IP | |
| 490 .EX | |
| 491 % venti/fmtisect isect0. /tmp/disks/isect0 | |
| 492 % venti/fmtisect isect1. /tmp/disks/isect1 | |
| 493 % venti/fmtarenas arenas0. /tmp/disks/arenas & | |
| 494 % venti/fmtbloom /tmp/disks/bloom & | |
| 495 % wait | |
| 496 % venti/fmtindex venti.conf | |
| 497 % | |
| 498 .EE | |
| 499 .PP | |
| 500 Start the server and check the storage statistics: | |
| 501 .IP | |
| 502 .EX | |
| 503 % venti/venti | |
| 504 % hget http://$sysname/storage | |
| 505 .EE | |
| 506 .SH SOURCE | |
| 507 .B \*9/src/cmd/venti/srv | |
| 508 .SH "SEE ALSO" | |
| 509 .MR venti (1) , | |
| 510 .MR venti (3) , | |
| 511 .MR venti (7) , | |
| 512 .MR venti-backup (8) | |
| 513 .MR venti-fmt (8) | |
| 514 .br | |
| 515 Sean Quinlan and Sean Dorward, | |
| 516 ``Venti: a new approach to archival storage'', | |
| 517 .I "Usenix Conference on File and Storage Technologies" , | |
| 518 2002. | |
| 519 .SH BUGS | |
| 520 Setting up a venti server is too complicated. | |
| 521 .PP | |
| 522 Venti should not require the user to decide how to | |
| 523 partition its memory usage. | |
| 524 .PP | |
| 525 Users of shells other than | |
| 526 .MR rc (1) | |
| 527 will not be able to use the program names shown. | |
| 528 One solution is to define | |
| 529 .B "V=$PLAN9/bin/venti" | |
| 530 and then substitute | |
| 531 .B $V/ | |
| 532 for | |
| 533 .B venti/ | |
| 534 in the paths above. |