| tventi.7 - plan9port - [fork] Plan 9 from user space | |
| git clone git://src.adamsgaard.dk/plan9port | |
| Log | |
| Files | |
| Refs | |
| README | |
| LICENSE | |
| --- | |
| tventi.7 (11062B) | |
| --- | |
| 1 .TH VENTI 7 | |
| 2 .SH NAME | |
| 3 venti \- archival storage server | |
| 4 .SH DESCRIPTION | |
| 5 Venti is a block storage server intended for archival data. | |
| 6 In a Venti server, the SHA1 hash of a block's contents acts | |
| 7 as the block identifier for read and write operations. | |
| 8 This approach enforces a write-once policy, preventing | |
| 9 accidental or malicious destruction of data. In addition, | |
| 10 duplicate copies of a block are coalesced, reducing the | |
| 11 consumption of storage and simplifying the implementation | |
| 12 of clients. | |
| 13 .PP | |
| 14 This manual page documents the basic concepts of | |
| 15 block storage using Venti as well as the Venti network protocol. | |
| 16 .PP | |
| 17 .MR Venti (1) | |
| 18 documents some simple clients. | |
| 19 .MR Vac (1) , | |
| 20 .MR vacfs (4) , | |
| 21 and | |
| 22 .MR vbackup (8) | |
| 23 are more complex clients. | |
| 24 .PP | |
| 25 .MR Venti (3) | |
| 26 describes a C library interface for accessing | |
| 27 Venti servers and manipulating Venti data structures. | |
| 28 .PP | |
| 29 .MR Venti (8) | |
| 30 describes the programs used to run a Venti server. | |
| 31 .PP | |
| 32 .SS "Scores | |
| 33 The SHA1 hash that identifies a block is called its | |
| 34 .IR score . | |
| 35 The score of the zero-length block is called the | |
| 36 .IR "zero score" . | |
| 37 .PP | |
| 38 Scores may have an optional | |
| 39 .IB label : | |
| 40 prefix, typically used to | |
| 41 describe the format of the data. | |
| 42 For example, | |
| 43 .MR vac (1) | |
| 44 uses a | |
| 45 .B vac: | |
| 46 prefix, while | |
| 47 .MR vbackup (8) | |
| 48 uses prefixes corresponding to the file system | |
| 49 types: | |
| 50 .BR ext2: , | |
| 51 .BR ffs: , | |
| 52 and so on. | |
| 53 .SS "Files and Directories | |
| 54 Venti accepts blocks up to 56 kilobytes in size. | |
| 55 By convention, Venti clients use hash trees of blocks to | |
| 56 represent arbitrary-size data | |
| 57 .IR files . | |
| 58 The data to be stored is split into fixed-size | |
| 59 blocks and written to the server, producing a list | |
| 60 of scores. | |
| 61 The resulting list of scores is split into fixed-size pointer | |
| 62 blocks (using only an integral number of scores per block) | |
| 63 and written to the server, producing a smaller list | |
| 64 of scores. | |
| 65 The process continues, eventually ending with the | |
| 66 score for the hash tree's top-most block. | |
| 67 Each file stored this way is summarized by | |
| 68 a | |
| 69 .B VtEntry | |
| 70 structure recording the top-most score, the depth | |
| 71 of the tree, the data block size, and the pointer block size. | |
| 72 One or more | |
| 73 .B VtEntry | |
| 74 structures can be concatenated | |
| 75 and stored as a special file called a | |
| 76 .IR directory . | |
| 77 In this | |
| 78 manner, arbitrary trees of files can be constructed | |
| 79 and stored. | |
| 80 .PP | |
| 81 Scores passed between programs conventionally refer | |
| 82 to | |
| 83 .B VtRoot | |
| 84 blocks, which contain descriptive information | |
| 85 as well as the score of a directory block containing a small number | |
| 86 of directory entries. | |
| 87 .PP | |
| 88 Conventionally, programs do not mix data and directory entries | |
| 89 in the same file. Instead, they keep two separate files, one with | |
| 90 directory entries and one with metadata referencing those | |
| 91 entries by position. | |
| 92 Keeping this parallel representation is a minor annoyance | |
| 93 but makes it possible for general programs like | |
| 94 .I venti/copy | |
| 95 (see | |
| 96 .MR venti (1) ) | |
| 97 to traverse the block tree without knowing the specific details | |
| 98 of any particular program's data. | |
| 99 .SS "Block Types | |
| 100 To allow programs to traverse these structures without | |
| 101 needing to understand their higher-level meanings, | |
| 102 Venti tags each block with a type. The types are: | |
| 103 .PP | |
| 104 .nf | |
| 105 .ft L | |
| 106 VtDataType 000 \f1data\fL | |
| 107 VtDataType+1 001 \fRscores of \fPVtDataType\fR blocks\fL | |
| 108 VtDataType+2 002 \fRscores of \fPVtDataType+1\fR blocks\fL | |
| 109 \fR\&...\fL | |
| 110 VtDirType 010 VtEntry\fR structures\fL | |
| 111 VtDirType+1 011 \fRscores of \fLVtDirType\fR blocks\fL | |
| 112 VtDirType+2 012 \fRscores of \fLVtDirType+1\fR blocks\fL | |
| 113 \fR\&...\fL | |
| 114 VtRootType 020 VtRoot\fR structure\fL | |
| 115 .fi | |
| 116 .PP | |
| 117 The octal numbers listed are the type numbers used | |
| 118 by the commands below. | |
| 119 (For historical reasons, the type numbers used on | |
| 120 disk and on the wire are different from the above. | |
| 121 They do not distinguish | |
| 122 .BI VtDataType+ n | |
| 123 blocks from | |
| 124 .BI VtDirType+ n | |
| 125 blocks.) | |
| 126 .SS "Zero Truncation | |
| 127 To avoid storing the same short data blocks padded with | |
| 128 differing numbers of zeros, Venti clients working with fixed-size | |
| 129 blocks conventionally | |
| 130 `zero truncate' the blocks before writing them to the server. | |
| 131 For example, if a 1024-byte data block contains the | |
| 132 11-byte string | |
| 133 .RB ` hello " " world ' | |
| 134 followed by 1013 zero bytes, | |
| 135 a client would store only the 11-byte block. | |
| 136 When the client later read the block from the server, | |
| 137 it would append zero bytes to the end as necessary to | |
| 138 reach the expected size. | |
| 139 .PP | |
| 140 When truncating pointer blocks | |
| 141 .RB ( VtDataType+ \fIn | |
| 142 and | |
| 143 .BI VtDirType+ n | |
| 144 blocks), | |
| 145 trailing zero scores are removed | |
| 146 instead of trailing zero bytes. | |
| 147 .PP | |
| 148 Because of the truncation convention, | |
| 149 any file consisting entirely of zero bytes, | |
| 150 no matter what its length, will be represented by the zero score: | |
| 151 the data blocks contain all zeros and are thus truncated | |
| 152 to the empty block, and the pointer blocks contain all zero scores | |
| 153 and are thus also truncated to the empty block, | |
| 154 and so on up the hash tree. | |
| 155 .SS Network Protocol | |
| 156 A Venti session begins when a | |
| 157 .I client | |
| 158 connects to the network address served by a Venti | |
| 159 .IR server ; | |
| 160 the conventional address is | |
| 161 .BI tcp! server !venti | |
| 162 (the | |
| 163 .B venti | |
| 164 port is 17034). | |
| 165 Both client and server begin by sending a version | |
| 166 string of the form | |
| 167 .BI venti- versions - comment \en \fR. | |
| 168 The | |
| 169 .I versions | |
| 170 field is a list of acceptable versions separated by | |
| 171 colons. | |
| 172 The protocol described here is version | |
| 173 .BR 02 . | |
| 174 The client is responsible for choosing a common | |
| 175 version and sending it in the | |
| 176 .B VtThello | |
| 177 message, described below. | |
| 178 .PP | |
| 179 After the initial version exchange, the client transmits | |
| 180 .I requests | |
| 181 .RI ( T-messages ) | |
| 182 to the server, which subsequently returns | |
| 183 .I replies | |
| 184 .RI ( R-messages ) | |
| 185 to the client. | |
| 186 The combined act of transmitting (receiving) a request | |
| 187 of a particular type, and receiving (transmitting) its reply | |
| 188 is called a | |
| 189 .I transaction | |
| 190 of that type. | |
| 191 .PP | |
| 192 Each message consists of a sequence of bytes. | |
| 193 Two-byte fields hold unsigned integers represented | |
| 194 in big-endian order (most significant byte first). | |
| 195 Data items of variable lengths are represented by | |
| 196 a one-byte field specifying a count, | |
| 197 .IR n , | |
| 198 followed by | |
| 199 .I n | |
| 200 bytes of data. | |
| 201 Text strings are represented similarly, | |
| 202 using a two-byte count with | |
| 203 the text itself stored as a UTF-encoded sequence | |
| 204 of Unicode characters (see | |
| 205 .MR utf (7) ). | |
| 206 Text strings are not | |
| 207 .SM NUL\c | |
| 208 -terminated: | |
| 209 .I n | |
| 210 counts the bytes of UTF data, which include no final | |
| 211 zero byte. | |
| 212 The | |
| 213 .SM NUL | |
| 214 character is illegal in text strings in the Venti protocol. | |
| 215 The maximum string length in Venti is 1024 bytes. | |
| 216 .PP | |
| 217 Each Venti message begins with a two-byte size field | |
| 218 specifying the length in bytes of the message, | |
| 219 not including the length field itself. | |
| 220 The next byte is the message type, one of the constants | |
| 221 in the enumeration in the include file | |
| 222 .BR <venti.h> . | |
| 223 The next byte is an identifying | |
| 224 .IR tag , | |
| 225 used to match responses to requests. | |
| 226 The remaining bytes are parameters of different sizes. | |
| 227 In the message descriptions, the number of bytes in a field | |
| 228 is given in brackets after the field name. | |
| 229 The notation | |
| 230 .IR parameter [ n ] | |
| 231 where | |
| 232 .I n | |
| 233 is not a constant represents a variable-length parameter: | |
| 234 .IR n [1] | |
| 235 followed by | |
| 236 .I n | |
| 237 bytes of data forming the | |
| 238 .IR parameter . | |
| 239 The notation | |
| 240 .IR string [ s ] | |
| 241 (using a literal | |
| 242 .I s | |
| 243 character) | |
| 244 is shorthand for | |
| 245 .IR s [2] | |
| 246 followed by | |
| 247 .I s | |
| 248 bytes of UTF-8 text. | |
| 249 The notation | |
| 250 .IR parameter [] | |
| 251 where | |
| 252 .I parameter | |
| 253 is the last field in the message represents a | |
| 254 variable-length field that comprises all remaining | |
| 255 bytes in the message. | |
| 256 .PP | |
| 257 All Venti RPC messages are prefixed with a field | |
| 258 .IR size [2] | |
| 259 giving the length of the message that follows | |
| 260 (not including the | |
| 261 .I size | |
| 262 field itself). | |
| 263 The message bodies are: | |
| 264 .ta \w'\fLVtTgoodbye 'u | |
| 265 .IP | |
| 266 .ne 2v | |
| 267 .B VtThello | |
| 268 .IR tag [1] | |
| 269 .IR version [ s ] | |
| 270 .IR uid [ s ] | |
| 271 .IR strength [1] | |
| 272 .IR crypto [ n ] | |
| 273 .IR codec [ n ] | |
| 274 .br | |
| 275 .B VtRhello | |
| 276 .IR tag [1] | |
| 277 .IR sid [ s ] | |
| 278 .IR rcrypto [1] | |
| 279 .IR rcodec [1] | |
| 280 .IP | |
| 281 .ne 2v | |
| 282 .B VtTping | |
| 283 .IR tag [1] | |
| 284 .br | |
| 285 .B VtRping | |
| 286 .IR tag [1] | |
| 287 .IP | |
| 288 .ne 2v | |
| 289 .B VtTread | |
| 290 .IR tag [1] | |
| 291 .IR score [20] | |
| 292 .IR type [1] | |
| 293 .IR pad [1] | |
| 294 .IR count [2] | |
| 295 .br | |
| 296 .B VtRread | |
| 297 .IR tag [1] | |
| 298 .IR data [] | |
| 299 .IP | |
| 300 .ne 2v | |
| 301 .B VtTwrite | |
| 302 .IR tag [1] | |
| 303 .IR type [1] | |
| 304 .IR pad [3] | |
| 305 .IR data [] | |
| 306 .br | |
| 307 .B VtRwrite | |
| 308 .IR tag [1] | |
| 309 .IR score [20] | |
| 310 .IP | |
| 311 .ne 2v | |
| 312 .B VtTsync | |
| 313 .IR tag [1] | |
| 314 .br | |
| 315 .B VtRsync | |
| 316 .IR tag [1] | |
| 317 .IP | |
| 318 .ne 2v | |
| 319 .B VtRerror | |
| 320 .IR tag [1] | |
| 321 .IR error [ s ] | |
| 322 .IP | |
| 323 .ne 2v | |
| 324 .B VtTgoodbye | |
| 325 .IR tag [1] | |
| 326 .PP | |
| 327 Each T-message has a one-byte | |
| 328 .I tag | |
| 329 field, chosen and used by the client to identify the message. | |
| 330 The server will echo the request's | |
| 331 .I tag | |
| 332 field in the reply. | |
| 333 Clients should arrange that no two outstanding | |
| 334 messages have the same tag field so that responses | |
| 335 can be distinguished. | |
| 336 .PP | |
| 337 The type of an R-message will either be one greater than | |
| 338 the type of the corresponding T-message or | |
| 339 .BR Rerror , | |
| 340 indicating that the request failed. | |
| 341 In the latter case, the | |
| 342 .I error | |
| 343 field contains a string describing the reason for failure. | |
| 344 .PP | |
| 345 Venti connections must begin with a | |
| 346 .B hello | |
| 347 transaction. | |
| 348 The | |
| 349 .B VtThello | |
| 350 message contains the protocol | |
| 351 .I version | |
| 352 that the client has chosen to use. | |
| 353 The fields | |
| 354 .IR strength , | |
| 355 .IR crypto , | |
| 356 and | |
| 357 .IR codec | |
| 358 could be used to add authentication, encryption, | |
| 359 and compression to the Venti session | |
| 360 but are currently ignored. | |
| 361 The | |
| 362 .IR rcrypto , | |
| 363 and | |
| 364 .I rcodec | |
| 365 fields in the | |
| 366 .B VtRhello | |
| 367 response are similarly ignored. | |
| 368 The | |
| 369 .IR uid | |
| 370 and | |
| 371 .IR sid | |
| 372 fields are intended to be the identity | |
| 373 of the client and server but, given the lack of | |
| 374 authentication, should be treated only as advisory. | |
| 375 The initial | |
| 376 .B hello | |
| 377 should be the only | |
| 378 .B hello | |
| 379 transaction during the session. | |
| 380 .PP | |
| 381 The | |
| 382 .B ping | |
| 383 message has no effect and | |
| 384 is used mainly for debugging. | |
| 385 Servers should respond immediately to pings. | |
| 386 .PP | |
| 387 The | |
| 388 .B read | |
| 389 message requests a block with the given | |
| 390 .I score | |
| 391 and | |
| 392 .IR type . | |
| 393 Use | |
| 394 .I vttodisktype | |
| 395 and | |
| 396 .I vtfromdisktype | |
| 397 (see | |
| 398 .MR venti (3) ) | |
| 399 to convert a block type enumeration value | |
| 400 .RB ( VtDataType , | |
| 401 etc.) | |
| 402 to the | |
| 403 .I type | |
| 404 used on disk and in the protocol. | |
| 405 The | |
| 406 .I count | |
| 407 field specifies the maximum expected size | |
| 408 of the block. | |
| 409 The | |
| 410 .I data | |
| 411 in the reply is the block's contents. | |
| 412 .PP | |
| 413 The | |
| 414 .B write | |
| 415 message writes a new block of the given | |
| 416 .I type | |
| 417 with contents | |
| 418 .I data | |
| 419 to the server. | |
| 420 The response includes the | |
| 421 .I score | |
| 422 to use to read the block, | |
| 423 which should be the SHA1 hash of | |
| 424 .IR data . | |
| 425 .PP | |
| 426 The Venti server may buffer written blocks in memory, | |
| 427 waiting until after responding to the | |
| 428 .B write | |
| 429 message before writing them to | |
| 430 permanent storage. | |
| 431 The server will delay the response to a | |
| 432 .B sync | |
| 433 message until after all blocks in earlier | |
| 434 .B write | |
| 435 messages have been written to permanent storage. | |
| 436 .PP | |
| 437 The | |
| 438 .B goodbye | |
| 439 message ends a session. There is no | |
| 440 .BR VtRgoodbye : | |
| 441 upon receiving the | |
| 442 .BR VtTgoodbye | |
| 443 message, the server terminates up the connection. | |
| 444 .PP | |
| 445 Version | |
| 446 .B 04 | |
| 447 of the Venti protocol is similar to version | |
| 448 .B 02 | |
| 449 (described above) | |
| 450 but has two changes to accomodates larger payloads. | |
| 451 First, it replaces the leading 2-byte packet size with | |
| 452 a 4-byte size. | |
| 453 Second, the | |
| 454 .I count | |
| 455 in the | |
| 456 .B VtTread | |
| 457 packet may be either 2 or 4 bytes; | |
| 458 the total packet length distinguishes the two cases. | |
| 459 .SH SEE ALSO | |
| 460 .MR venti (1) , | |
| 461 .MR venti (3) , | |
| 462 .MR venti (8) | |
| 463 .br | |
| 464 Sean Quinlan and Sean Dorward, | |
| 465 ``Venti: a new approach to archival storage'', | |
| 466 .I "Usenix Conference on File and Storage Technologies" , | |
| 467 2002. |