File Compression

Archiving & Compression


The two processes go hand in hand but are not the same.

Archives:

  • store rarely used information and preserve it
  • files are stored as a single file
  • archiving makes the data more portable and can serve as a backup in case of loss or corruption

Compression:

  • reduces file size by reducing redundancy
  • preserve storage space
  • speeds up data transfer
  • reduces bandwidth load

Let’s look at a file directory of notes you’ve been taking:

tar - create archive

tar can be used for many operations we’ll start with creating an archive. It can be used to archive and extract files

A popular term is “tar ball”, To archive the notes directory

  • -c -f or cf
    • -c to create a new tar archive file
    • -f tells tar to interpret its input from the file rather than from the default which is standard input
  • tar -cf <newfile.tar> <directory to archive >

If you would also like your archive to be compressed, you can

  • enter the same command, except now you include the “-z” option, which
  • filters the archive file through a GNU (pr. “geh-noo”) compression program called g-zip.
  • Adding the suffix “dot g z” to the output name, ensures that Windows-based programs, for example, will correctly recognize the file type.
  • See create gz file section below
  • tar -czf <newfile.tar.gz> < directory to archive&compress>

File type Description
.tar uncompressed archive file
.zip (usually) compressed archive file
.gz file (archive or not) compressed using gzip

Tar supports a vast range of compression programs such as 

  • gzip
  • bzip2
  • lzip
  • lzma
  • lzop
  • xz and
  • compress
  • When creating compressed tar archives, it is an accepted convention to append the compressor suffix to the archive file name. For example, if an archive has been compressed with gzip , it should be named archive.tar.gz.
  • -c -f or cf
    • -c to create a new tar archive file
    • -f tells tar to interpret its input from the file rather than from the default which is standard input
  • -v will display the files being processed
# ___  Create an archive named archive.tar from file1 file2
tar cf archive.tar file1 file2


# ___  Create a file backup.tar of the /home/user directory
tar -cf backup.tar /home/user

tar- create gz archive

Gzip is the most popular algorithm for compressing tar files. When compressing tar archives with gzip, the archive name should end with either tar.gz or tgz.

  • -z tells tar to compress using the gzip algorithm
  • -j tells tar to use the gzip2 algorithm and creates a .bz2 archive file
# ___  Create a tar.gz file
tar cfz archive.tar.gz file1 file2 file3...

# ___  Create a tar.bz2 file
tar cfj archive.tar.bz2 file1 file2 file3...

list tar ball files

  • From the notes example above, if we want to make sure all the files were compressed and archived we can use
  • -tf will list all the files and directories in your tar ball
  • tar -tf <newfile.tar>

tar to list

When used with the list -t option the tar command will list the content of a tar archive without extracting it

  • -t list content of archive file
  • -v to list file owner, size, timestamp
  • DOES NOT EXTRACT
# ___  List the content and information of an archived file without extracting
tar tvf archive.tar

tar to extract

To extract a tar or tar.gz format file use tar with the options below. By default it will extract the contents to the current working directory.

  • -xf used to extract an archive file - xf is used for tar.gz or tar.bz2 files with a destination folder as shown below tar -xf <file to extract> <filename to extract to>
  • -xvf to print names of files that are being extracted
  • -c extract to a specified directory
# ___  List and EXTRACT files from an archived file
tar xvf archive.tar

# ___  Extract the archive file contents to the opt/files directory
tar xf archive.tar -c /opt/files

tar Decompress & Extract

# ___ Decompress and Extract
tar -xzf notes.tar.gz newfilename
ls -R

tar extract specific files

To extract a specific file(s) from a tar archive, append a space-separated list of file names to be extracted after the archive name

# ___  Extract specific files from archive file
tar xf archive.tar file1 file2

tar extract with wildcard

To extract files from an archive based on a wildcard pattern, use the --wildcards switch and quote the pattern to prevent the shell from interpreting it.

# ___  Extract files ending in .js
tar xf archive.tar - - wildcards '*.js'

tar add files

To add files to an existing tar archive use -r

# ___  Add new file to archive -v to display the name of files
tar rvf archive.tar newfile

tar remove files

Use - - delete to remove files from an archive

# ___  Remove file1 from archive
tar - -delete -f archive.tar file1
Command Description
tar cf my_dir.tar my_dir Create an uncompressed tar archive
tar cfz my_dir.tar my_dir Create a tar archive with gzip compression
gzip file Compress a file with gzip compression
tar xf file Extract the contents of any type of tar archive
gunzip file.gz Decompress a file that has gzip compression

Zip


Zip compresses files prior to bundling them, while tar with -z bundles then compresses

To compress a directory and package it to a zip file use: zip -r <new filename.zip> <directory to compress>

Unzip

Extracts and decompresses zipped archive files

Syntax: unzip <zipped filename.zip>

Unzip to Directory

If we want to specify where to unzip the zipped file we need to use

  • -d which specifies the target directory

  • This will create a new directory named employeesdb and extracts all files to it

    ~$ unzip $dir_path/employeesdb.zip -d $dir_path
    
    # RESPONSE
    Archive:  /mnt/d/data/MySQL/employeesdb.zip
       creating: /mnt/d/data/MySQL/employeesdb/
       creating: /mnt/d/data/MySQL/employeesdb/sakila/
      inflating: /mnt/d/data/MySQL/employeesdb/load_salaries2.dump
      inflating: /mnt/d/data/MySQL/employeesdb/test_versions.sh
      inflating: /mnt/d/data/MySQL/employeesdb/objects.sql
      inflating: /mnt/d/data/MySQL/employeesdb/load_salaries3.dump
      inflating: /mnt/d/data/MySQL/employeesdb/load_dept_emp.dump
      inflating: /mnt/d/data/MySQL/employeesdb/test_employees_sha.sql
      inflating: /mnt/d/data/MySQL/employeesdb/Changelog
       creating: /mnt/d/data/MySQL/employeesdb/images/
      inflating: /mnt/d/data/MySQL/employeesdb/employees_partitioned_5.1.sql
      inflating: /mnt/d/data/MySQL/employeesdb/test_employees_md5.sql
      inflating: /mnt/d/data/MySQL/employeesdb/README.md
      inflating: /mnt/d/data/MySQL/employeesdb/employees.sql
      inflating: /mnt/d/data/MySQL/employeesdb/load_titles.dump
      inflating: /mnt/d/data/MySQL/employeesdb/employees_partitioned.sql
      inflating: /mnt/d/data/MySQL/employeesdb/load_dept_manager.dump
      inflating: /mnt/d/data/MySQL/employeesdb/sql_test.sh
      inflating: /mnt/d/data/MySQL/employeesdb/load_departments.dump
      inflating: /mnt/d/data/MySQL/employeesdb/load_salaries1.dump
      inflating: /mnt/d/data/MySQL/employeesdb/show_elapsed.sql
      inflating: /mnt/d/data/MySQL/employeesdb/load_employees.dump
      inflating: /mnt/d/data/MySQL/employeesdb/sakila/README.md
      inflating: /mnt/d/data/MySQL/employeesdb/sakila/sakila-mv-data.sql
      inflating: /mnt/d/data/MySQL/employeesdb/sakila/sakila-mv-schema.sql
      inflating: /mnt/d/data/MySQL/employeesdb/images/employees.jpg
      inflating: /mnt/d/data/MySQL/employeesdb/images/employees.png
      inflating: /mnt/d/data/MySQL/employeesdb/images/employees.gif