MLUG logo
Malta Linux User Group
Promoting Linux and Open Source Software in Malta
sponsors:MEGABYTE total computer solutions

Self-extracting Archive HOWTO

GNU/Linux includes a beautifully simple way of creating self-extracting archives, and "self-installing" archives. This article explains how.
Author:Ramon Casha
Date:15 March 2002


  1. Introduction
  2. A simple self-extracting archive
  3. Automatically counting the header size
  4. Self-installing archive
  5. Some Notes
  6. Credits


GNU/Linux provides all the commands necessary to create archives, compress them and re-extract them, as well as run complex installation scripts. So, why not automate things a bit? This article explains how to create an "executable archive" that, when executed, extracts itself into a directory and, optionally, starts an installation script.

The trick is actually very simple - the file in question is a combined shell script plus tar/gzip archive. The script file identifies the beginning of the archive portion and extracts from there.

Of course the end-user will still have to set the file permissions to executable before running it, or else execute it using the syntax "sh filename.ext".

A simple self-extracting archive

To produce this archive you need two files - a header file, which is described below, and the actual tar/gzip archive. You can, of course, use any other archive format instead of tar+gzip, but this is the most widespread, and the most likely to be present on all end users' computers.

The self-extracting archive consists of the header and the archive (in that order) concatenated together using the cat command. For instance, if the header is called sfx-header and the archive is called mypackage-1.2.3.tgz, you can use the following command:

cat sfx-header mypackage-1.2.3.tgz >

Here is the content of the header portion:

echo ""
echo "MyPackage v99.99 - extracting archive... please wait"
echo ""

# take the archive portion of this file and pipe it to tar
# the NUMERIC parameter in this command should be one more
# than the number of lines in this header file
tail +12 $0 | tar xz

exit 0

The above section contains a very simple shell script. After a couple of echo statements to display versions etc., we have the command that does all the work. The tail command extracts the the contents of the self-extracting file starting at the 12th line (which is the one just after the exit statement), and pipes it to the tar command which extracts the contents into the current directory.

Obviously this is a very simple script. For example, it does not prompt for the destination directory. One could extend it to prompt for a destination directory, etc.

The important thing is that if you change the size of the header file, the numeric parameter to the 'tail' command must be changed accordingly - it must remain one more than the number of lines in the header file. Fortunately we can automate this too.

Automatically counting the header size

It is possible to derive the number of lines in the header portion at runtime, using the awk command to search for a string which identifies the end of the script portion.

echo ""
echo "MyPackage v99.99 - extracting archive... please wait"
echo ""

SKIP=`awk '/^__ARCHIVE_FOLLOWS__/ { print NR + 1; exit 0; }' $0`

# take the archive portion of this file and pipe it to tar
tail +$SKIP $0 | tar xz

exit 0


In the modified script, the first highlighted line uses the awk command to search for the string "__ARCHIVE_FOLLOWS__", which is present on the very last line of the script, and returns that line number plus one. The line following it is a simple check to ensure the line has been found. The tail command now receives the environment variable $SKIP instead of the hardwired line number. Finally, as the last line of the script we have the identifying string. It's important for this to be the very last line (no blank lines following it), and at the first column in the line.

Self-installing archive

A variation on the above is the self-installing archive. In this case, the main differences are that the files are extracted into a temporary directory, a file within that directory is executed to perform the setup, and finally the temporary directory is removed. Again, we have a header and an archive stuck together using the cat command. Here's the header file for the self-installing archive:

echo ""
echo "MyPackage v99.99 - starting installation... please wait"
echo ""

# create a temp directory to extract to.
export WRKDIR=`mktemp -d /tmp/selfextract.XXXXXX`

SKIP=`awk '/^__ARCHIVE_FOLLOWS__/ { print NR + 1; exit 0; }' $0`

# Take the TGZ portion of this file and pipe it to tar.
tail +$SKIP $0 | tar xz -C $WRKDIR

# execute the installation script


# delete the temp files
cd $PREV
rm -rf $WRKDIR

exit 0


In this case, after the display portion, we create a temporary directory and place it in $WRKDIR. The 'tar' command is instructed to extract all files into that directory, then we go into that directory and execute a file '', which the archive should contain and which should perform the rest of the installation. When the install script returns, we delete the temporary working directory.

Some Notes

Some *nix systems might not have the GNU tar, so they may require the tar and gzip commands to be separated. In this case, the command to decompress and extract the archive could be done as follows:

tail +$SKIP $0 | gzip -dc | tar x -

The self-installing archive removes the contents of the temporary directory after the returns. Thus, the installation must ensure that no processes are still using the temporary files when it returns. If, for instance, the installation starts a detached process in that directory and returns, the self-extracting script will try to delete the directory while the process is still running.


Thanks to Tomi Ollila for his contributions to this article.