File Compression
Archiving & Compression
The two processes go hand in hand but are not the same.
Archives:
- store rarely used information and preserve it
- files are stored as a single file
- archiving makes the data more portable and can serve as a backup in case of loss or corruption
Compression:
- reduces file size by reducing redundancy
- preserve storage space
- speeds up data transfer
- reduces bandwidth load
Let’s look at a file directory of notes you’ve been taking:
tar - create archive
tar can be used for many operations we’ll start with creating an archive. It can be used to archive and extract files
A popular term is “tar ball”, To archive the notes directory
-c -f
orcf
-c
to create a new tar archive file-f
tells tar to interpret its input from the file rather than from the default which is standard input
tar -cf <newfile.tar> <directory to archive >
If you would also like your archive to be compressed, you can
- enter the same command, except now you include the “-z” option, which
- filters the archive file through a GNU (pr. “geh-noo”) compression program called g-zip.
- Adding the suffix “dot g z” to the output name, ensures that Windows-based programs, for example, will correctly recognize the file type.
- See create gz file section below
tar -czf <newfile.tar.gz> < directory to archive&compress>
File type | Description |
---|---|
.tar | uncompressed archive file |
.zip | (usually) compressed archive file |
.gz | file (archive or not) compressed using gzip |
Tar supports a vast range of compression programs such as
gzip
bzip2
lzip
lzma
lzop
xz
andcompress
- When creating compressed tar archives, it is an accepted convention to append the compressor suffix to the archive file name. For example, if an archive has been compressed with
gzip
, it should be named archive.tar.gz. -c -f
orcf
-c
to create a new tar archive file-f
tells tar to interpret its input from the file rather than from the default which is standard input
-v
will display the files being processed
# ___ Create an archive named archive.tar from file1 file2
tar cf archive.tar file1 file2
# ___ Create a file backup.tar of the /home/user directory
-cf backup.tar /home/user tar
tar- create gz archive
Gzip is the most popular algorithm for compressing tar files. When compressing tar archives with gzip, the archive name should end with either tar.gz
or tgz
.
-z
tells tar to compress using the gzip algorithm-j
tells tar to use the gzip2 algorithm and creates a .bz2 archive file
# ___ Create a tar.gz file
tar cfz archive.tar.gz file1 file2 file3...
# ___ Create a tar.bz2 file
tar cfj archive.tar.bz2 file1 file2 file3...
list tar ball files
- From the notes example above, if we want to make sure all the files were compressed and archived we can use
-tf
will list all the files and directories in your tar balltar -tf <newfile.tar>
tar to list
When used with the list
-t
option the tar command will list the content of a tar archive without extracting it
- -t list content of archive file
- -v to list file owner, size, timestamp
- DOES NOT EXTRACT
# ___ List the content and information of an archived file without extracting
tar tvf archive.tar
tar to extract
To extract a tar or tar.gz format file use tar with the options below. By default it will extract the contents to the current working directory.
-xf
used to extract an archive file- xf
is used for tar.gz or tar.bz2 files with a destination folder as shown belowtar -xf <file to extract> <filename to extract to>
-xvf
to print names of files that are being extracted-c
extract to a specified directory
# ___ List and EXTRACT files from an archived file
tar xvf archive.tar
# ___ Extract the archive file contents to the opt/files directory
-c /opt/files tar xf archive.tar
tar Decompress & Extract
# ___ Decompress and Extract
-xzf notes.tar.gz newfilename
tar -R ls
tar extract specific files
To extract a specific file(s) from a tar archive, append a space-separated list of file names to be extracted after the archive name
# ___ Extract specific files from archive file
tar xf archive.tar file1 file2
tar extract with wildcard
To extract files from an archive based on a wildcard pattern, use the
--wildcards
switch and quote the pattern to prevent the shell from interpreting it.
# ___ Extract files ending in .js
- - wildcards '*.js' tar xf archive.tar
tar add files
To add files to an existing tar archive use
-r
# ___ Add new file to archive -v to display the name of files
tar rvf archive.tar newfile
tar remove files
Use - - delete to remove files from an archive
# ___ Remove file1 from archive
- -delete -f archive.tar file1 tar
Command | Description |
---|---|
tar cf my_dir.tar my_dir |
Create an uncompressed tar archive |
tar cfz my_dir.tar my_dir |
Create a tar archive with gzip compression |
gzip file |
Compress a file with gzip compression |
tar xf file |
Extract the contents of any type of tar archive |
gunzip file.gz |
Decompress a file that has gzip compression |
Zip
Zip compresses files prior to bundling them, while tar with -z bundles then compresses
To compress a directory and package it to a zip file use: zip -r <new filename.zip> <directory to compress>
Unzip
Extracts and decompresses zipped archive files
Syntax: unzip <zipped filename.zip>
Unzip to Directory
If we want to specify where to unzip the zipped file we need to use
-d
which specifies the target directoryThis will create a new directory named employeesdb and extracts all files to it
~$ unzip $dir_path/employeesdb.zip -d $dir_path # RESPONSE /mnt/d/data/MySQL/employeesdb.zip Archive: /mnt/d/data/MySQL/employeesdb/ creating: /mnt/d/data/MySQL/employeesdb/sakila/ creating: /mnt/d/data/MySQL/employeesdb/load_salaries2.dump inflating: /mnt/d/data/MySQL/employeesdb/test_versions.sh inflating: /mnt/d/data/MySQL/employeesdb/objects.sql inflating: /mnt/d/data/MySQL/employeesdb/load_salaries3.dump inflating: /mnt/d/data/MySQL/employeesdb/load_dept_emp.dump inflating: /mnt/d/data/MySQL/employeesdb/test_employees_sha.sql inflating: /mnt/d/data/MySQL/employeesdb/Changelog inflating: /mnt/d/data/MySQL/employeesdb/images/ creating: /mnt/d/data/MySQL/employeesdb/employees_partitioned_5.1.sql inflating: /mnt/d/data/MySQL/employeesdb/test_employees_md5.sql inflating: /mnt/d/data/MySQL/employeesdb/README.md inflating: /mnt/d/data/MySQL/employeesdb/employees.sql inflating: /mnt/d/data/MySQL/employeesdb/load_titles.dump inflating: /mnt/d/data/MySQL/employeesdb/employees_partitioned.sql inflating: /mnt/d/data/MySQL/employeesdb/load_dept_manager.dump inflating: /mnt/d/data/MySQL/employeesdb/sql_test.sh inflating: /mnt/d/data/MySQL/employeesdb/load_departments.dump inflating: /mnt/d/data/MySQL/employeesdb/load_salaries1.dump inflating: /mnt/d/data/MySQL/employeesdb/show_elapsed.sql inflating: /mnt/d/data/MySQL/employeesdb/load_employees.dump inflating: /mnt/d/data/MySQL/employeesdb/sakila/README.md inflating: /mnt/d/data/MySQL/employeesdb/sakila/sakila-mv-data.sql inflating: /mnt/d/data/MySQL/employeesdb/sakila/sakila-mv-schema.sql inflating: /mnt/d/data/MySQL/employeesdb/images/employees.jpg inflating: /mnt/d/data/MySQL/employeesdb/images/employees.png inflating: /mnt/d/data/MySQL/employeesdb/images/employees.gif inflating: