Title: How to split a file into small parts | |
Author: Solène | |
Date: 21 March 2021 | |
Tags: openbsd unix | |
Description: | |
# Introduction | |
Today I will present the userland program "split" that is used to split | |
a single file into smaller files. | |
OpenBSD split(1) manual page | |
# Use case | |
Split will create new files from a single files, but smaller. The | |
original file can be get back using the command cat on all the small | |
files (in the correct order) to recreate the original file. | |
There are several use cases for this: | |
- store a single file (like a backup) on multiple medias (floppies, | |
700MB CD, DVDs etc..) | |
- parallelize a file process, for example: split a huge log file into | |
small parts to run analysis on each part | |
- distribute a file across a few people (I have no idea about the use | |
but I like the idea) | |
# Usage | |
Its usage is very simple, run split on a file or feed its standard | |
input, it will create 1000 lines long files by default. -b could be | |
used to tell a size in kB or MB for the new files or use -l to change | |
the default 1000 lines. Split can also create a new file each time a | |
line match a regex given with -p. | |
Here is a simple example splitting a file into 1300kB parts and then | |
reassemble the file from the parts, using sha256 to compare checksum of | |
the original and reconstructed files. | |
```split and reassemble example | |
solene@kongroo ~/V/pmenu> split -b 1300k pmenu.mp4 | |
solene@kongroo ~/V/pmenu> ls | |
pmenu.mp4 xab xad xaf xah xaj xal xan | |
xaa xac xae xag xai xak xam | |
solene@kongroo ~/V/pmenu> cat x* > concat.mp4 | |
solene@kongroo ~/V/pmenu> sha256 pmenu.mp4 concat.mp4 | |
SHA256 (pmenu.mp4) = e284da1bf8e98226dc78836dd71e7dfe4c3eb9c4172861bafcb1e2afb… | |
SHA256 (concat.mp4) = e284da1bf8e98226dc78836dd71e7dfe4c3eb9c4172861bafcb1e2afb… | |
solene@kongroo ~/V/pmenu> ls -l x* | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xaa | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xab | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xac | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xad | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xae | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xaf | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xag | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xah | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xai | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xaj | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xak | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xal | |
-rw-r--r-- 1 solene wheel 1331200 Mar 21 16:50 xam | |
-rw-r--r-- 1 solene wheel 810887 Mar 21 16:50 xan | |
``` | |
# Conclusion | |
If you ever need to split files into small parts, think about the | |
command split. | |
For more advanced splitting requirements, the program csplit can be | |
used, I won't cover it here but I recommend reading the manual page for | |
its usage. | |
csplit manual page |