ClamAV Code Audit Report
========================
By Kevin Amorin & Timothy Morgan
Work done as part a of course in network security at 
Northeastern University in March, 2005.

This is being published for two reasons:
 - for hope that it will be useful to others in determining the relative
   security of the ClamAV package
 - to make public the recommendations to ClamAV developers on coding
   practices


TABLE OF CONTENTS
=================

I.   Introduction

II.  Scope of Audit
     A. Common libc Bugs
     B. File Race Conditions
     C. Resource Exaustion & Denial of Service

III. Tools
     A. Source Code Auditing
     B. Input Fuzzing

IV.  Findings
     A. Weak RNG Seeding
     B. Local Privilege Escalation on Mac OS
     C. Sigtool Password Overflow
     D. Low-Memory Null Pointer Dereferences

V.   Conclusion

VI.  References


I. INTRODUCTION
===============
This report describes the focus, methods, and findings of a source code
audit perfomed by Timothy Morgan and Kevin Amorin on the open source
anti-virus project ClamAV.  A description of the ClamAV package, from
the project website[1]: 
 "Clam AntiVirus is a GPL anti-virus toolkit for UNIX. The main purpose
  of this software is the integration with mail servers (attachment
  scanning). The package provides a flexible and scalable multi-threaded
  daemon, a command line scanner, and a tool for automatic updating via
  Internet."
The release version reviewed in this audit was 0.83, and can be obtained
from [2].


II. SCOPE OF AUDIT
==================
The authors of this report are not experts at code review.  That is the
real purpose of this project: to practice auditing on a real-world
application in order to learn more.  In addition, the duration of the
audit was limited to a few of weeks of part-time work. Therefore this
audit can not be considered in any way exhaustive.  For this reason, we
describe the specific types of issues we focused our search on.


II.A. COMMON LIBC BUGS
----------------------
The audit focused mainly on common buffer mis-management bugs, and other
dangerous uses of libc calls.  See [3] for more information on the
following system calls.


strcpy & strcat
sprintf

The obvious overflow potential associated with the use of these
functions were addressed in this audit.


printf
fprintf
et.al.

In our search, we looked for improper uses of string formats in this
family of functions.  Allowing unchecked user-controlled data to be
included in a string format field can allow an attacker to read and
overwrite arbitrary portions of process memory.


strncpy
strncat
snprintf

While these functions are much safer to use than their non-checking
counterparts, there are a number of problems that can arise.  For
instance, if any calculation of n in the input could possibly be
negative, the cast to an unsigned size_t can result in a length larger
than any buffer can realistically get, which makes this checking
useless.  In addition, off-by-one errors can occur with strncpy and
strncat, which could lead to denial of service attacks, or even code
execution in the worst case.


system

The use of system is very dangerous if attackers can control strings
being sent to it.  Since system does the equivalent of "/bin/sh -c" on
input, it is easy for attackers to inject arbitrary commands if it isn't
used very carefully.


II.B. FILE RACE CONDITIONS
--------------------------
Because ClamAV must be able to extract large files and large numbers of
files from various compressed formats, it often needs to store temporary
files on disk.  For this reason, we spent some time trying to understand
how it creates these temporary files, and if the methods are secure
against symlink attacks.  Besides looking for common misuses of tmpnam,
we tried to understand exactly where any race conditions existed, and if
they were exploitable by lower-privileged users based on the permissions
of files and directories.


II.C. RESOURCE EXAUSTION & DENIAL OF SERVICE
--------------------------------------------
While we didn't focus as much on this area, there was some effort put
into finding situations where an attacker could send files to clamd that
might cause a denial of service.  Virus scanners, due to the type of
work they need to do, often have problems with compressed files and
other combined file formats.

Some good tests to throw at a virus scanner include:
 - Sparse compressed files
   If a very large file, with all 0's is compressed, it becomes much
   smaller than its original size (often 1/10th of the original).  If a
   virus scanner attempts to uncompress the entire thing into memory, it
   will quickly run out of that resource.  (The term "spare file" refers
   to a Unix mechansim to store large, data-less files.  This can be
   used to quickly create very large virtual files containing all 0's.)

 - Heavily nested files
   Today's virus scanners all handle compressed files files pretty
   well, by extracting them and scanning the files inside for viruses.
   However, what happens when a .zip file is sent compressed inside of a
   .rar archive?  How about a file compressed 10 times with different
   file compression formats?  Sending a heavily nested compressed file,
   where each level has two seperate compressed files could result in an
   exponential number of files which need decompression.  How would a
   virus scanner handle this potentially massive number of files?

Once again, we executed some variations of these tests, but didn't focus
on this area.


III. TOOLS
==========

III.A. SOURCE CODE AUDITING
---------------------------
R.A.T.S.

Rough Auditing Tool for Security[4] is a program which looks for common
programming mistakes and risky code which could lead to
vulnerabilities.  We found this tool to be a good starting point, as it
found every possible dangerous libc call, but it also found a large
number of false positives which required weeding through.


Flawfinder

We also used Flawfinder[5], which does analysis very similar to R.A.T.S.
It is better at some things, and worse at others.  Neither does terribly
deep code analysis, and thus outputs many false positives.


III.B. INPUT FUZZING
--------------------
mangleme/htmler.py

Mangleme[6] is an HTML input fuzzer originally written to test popular
browsers against malformed web pages.  This was used by the author to
uncover a number of holes in multiple browsers.  htmler.py[7] is simply
a Python port of mangleme.  Because ClamAV does contain code to parse
HTML documents for link extraction, we decided to test it against this
tool.

In multiple tests we created anywhere between 10,000-100,000 files on
the local system and used the local scan option of ClamAV to test the
html parsing functions.  The files contained missing HTML tags,
malformed tags, bad URL locations, and binary data.  In our tests we
found the HTML parser worked flawlessly.  We were hoping to find a
potential DoS attack in the URL fetching code.  We did find that ClamAV
only fetched the first 5 URLs in the file, as a DoS protection
mechanism.


IV. FINDINGS
============

IV.A. WEAK RNG SEEDING
----------------------
In our search for file creation race conditions, we came across a
function which is relied upon to generate filenames for temporary files
and directories.  The full function, from the file libclamav/others.c,
is included below:


unsigned int cli_rndnum(unsigned int max)
{
    struct timeval tv;

  gettimeofday(&tv, (struct timezone *) 0);
  srand(tv.tv_usec+clock());

  return rand() % max;
}


This function re-seeds the random number generator *every* time it is
called, based on the current time of day's microseconds, and the amount
of CPU time the process has used.  Before finishing, the function
generates a random number, takes its modulus, and returns some number
between 0 and max-1.

First of all, re-seeding a random number generator for every number
needed is slow, and probably isn't very secure.  Secondly, the
parameters sent to srand() are *not* so unpredictable.  The clock() call
can be estimated by a local attacker via command line utilities.  The
tv.tv_usec value is hard to predict (around 20 bits of entropy on a
tested Linux system), but may not have a lot of entropy on other
platforms.  Finally, the man page on rand() [3] indicates that simply
taking the modulus of it's output isn't secure on all Unix systems.  
(It references [8] for this information.)  Apparently some rand()
implementations do not have very random low-order bits.  The high-order
bits should be used instead.

To ClamAV's (and the developers') credit, they do make an effort to mask
this deficiency in their generation of pseudo-random filenames.  Each
filename is based on the MD5 on half of the previous output of the
generation routine plus more random data (using the above function). 
Half of the output is then used as the filename, and the other half is
stored for use in future calls.  In order to guess future filenames, one
would need to be able to predict the output of the random function
above, and know at least one hidden MD5 output.

For a long-running process, such as clamd, a local user would have a
very hard time guessing future temporary filenames.  Perhaps in the
situation where a clamscan was run via a cron job, this code would be
vulnerable to attack.  Since the random number generation is done in a
tight-loop, and is re-seeded every time, the very first call in any
given file name generation is where most of the entropy would lie.  The
time it takes for clamscan to start up can't be considered secret, and
this is what the clock() call would be based on.

However, ClamAV seems to be very careful about checking for the
existance of files and the success or failure of temporary directory 
creation.  This makes the window of opportunity for an attacker
incredibly slim, if it exists at all.

It is important to note, however, that the underlying random number
generation function doesn't seem to be very safe, and is used in at
least one other place in the code base.  It is recommended that the
seeding of the RNG be done at startup only, and be based on entropy from
the base OS (for instance /dev/random on Linux).  If a good source of
entropy is not available on specific platforms, then something similar
to what is currently implemented could be used, but perhaps with some
additional sources entropy added in.


IV.B. LOCAL PRIVILEGE ESCALATION ON MAC OS
------------------------------------------
Under the Mac OS file system (HFS) files are saved as to parts data and
resource fork.  In ClamAV versions 0.80rc4 and later, support was added
to copy both the data and the resource fork when moving a virus infected
file.  The mechanism they used was the Mac local system utility ditto.
While there isn't a security issue with using the "ditto" command
itself, the system call they use to execute it is insecure:


#ifdef C_DARWIN
    /* On Mac OS X use ditto and copy resource fork, too. */
    char *ditto = (char *) mcalloc(strlen(src) + strlen(dest) + 30, sizeof(char));
    sprintf(ditto, "/usr/bin/ditto --rsrc %s %s", src, dest);

    if(system(ditto)) {
        free(ditto);
        return -1;
    }


There memory allocation for the call is also secure.  They do not
however check the filename for shell special characters.  If a file name
contains an embedded shell command the system will execute it as the
ClamAV current UID.  An example attack is as follows: 

Download a test virus
  http://www.eicar.org/download/eicar.com

And rename it like so:
$ mv eicar.com \;echo\ \"test\"\;


If the clam user does not have permissions to remove the file it will
try and copy the file and the resource fork via the ditto system call.
The command it will execute is:

system("/usr/bin/ditto -rsrc ;echo "test"; /tmp/;echo "test" ");

The shell will interpret the ';echo "test"; 's a separate command and
execute it.  The following is output of the test. 

./;echo "test";: Eicar-Test-Signature FOUND
usage:  ditto [ <options> ] src [ ... src ] dst
    <options> are any of:
    -v              print a line of status for each src copied
    -V              print a line of status for every file copied
    -X              do not descend into directories with a different
device ID

    -c              create a CPIO archive at dst
    -x              unpack the CPIO archives at src...
    -z              CPIO archives are compressed
    -k              archives are PKZip format
    --keepParent    parent directory of src is embedded in dst

    --arch archVal  fat files will be thinned to specified archVal
                    multiple -arch options can be specified
                    archVal should be one of "ppc", "i386", etc
    --bom bomFile   only files present in the specified bom are copied
    --rsrc          copy preserving resource data
    --sequesterRsrc copy resources via polite directory (PKZip only)
test


We have informed the ClamAV development team about this issue, and they
have confirmed the problem.  We are working with them now on a fix, and
a public release of the vulnerability.
UPDATE: ClamAV has fixed this issue.  
        SCN advisory available at:
          http://www.sentinelchicken.com/advisories/clamav/


IV.D. SIGTOOL PASSWORD STACK BUFFER OVERFLOW
--------------------------------------------
The only significant buffer mis-management issue we discovered was an
overflow in the utility "sigtool".  According to the man page:

  "sigtool  can  be  used  to  generate MD5 checksums, convert data into
   hexadecimal format, and build/unpack CVD databases.  It's  also to
   verify digital signatures  of  databases  and  list virus signature
   names."

We found an overflow in the password prompting code.  This code only
seems to be used when a virus database developer is trying to sign their
own virus databases.  For this reason, the bug should affect *very* few
users of this software.  Even if it affected the broad user-base, it
isn't clear that this would even lead to privilege escalation.  We will
describe it here, for completeness anyway.

In the file sigtool/sigtool.c, there is a function called "getdsig".
The following code snippet illustrates the problem:


char *getdsig(const char *host, const char *user, const char *data)
{
        char buff[300], cmd[100], *pass, *pt;

[...snip...]

    memset(cmd, 0, sizeof(cmd));
    pass = getpass("Password:");
    sprintf(cmd, "ClamSign:%s:%s:", user, pass);


It isn't difficult to determine that if one were to type in a password
longer than 100-11-1-strlen(user) characters, an overflow of the cmd
buffer would occur.

Once again, this isn't going to affect users, unless they sign their own
virus databases *and* sigtool is somehow run with elevated privileges.


IV.E. LOW-MEMORY NULL POINTER DEREFERENCES
------------------------------------------
While the code in ClamAV is very good about allocating enough memory for
its strings, it isn't always as good at making sure it's memory
allocation calls return successfully.  Under memory-constrained
conditions, the OS will return a null pointer if the allocation failed.
This could lead to ungraceful shutdowns of clamd, possibly causing a
denial of service situation.

It is important to note however, that ClamAV does employ a number of
techniques to resist memory exaustion.  We did not have time to audit
these mechanisms to determine if any of them could be circumvented.

Some examples of code where memory allocation return values are not
checked are show below:

libclamav/others.c::cl_settempdir(); lines 337-338:

 var = (char *) cli_malloc(8 + strlen(dir));
 sprintf(var, "TMPDIR=%s", dir);


libclamav/others.c::cli_gentemp(); lines 429-430:

 fname = cli_calloc(strlen(dirname) + strlen(dent->d_name) + 2,\
                    sizeof(char));
 sprintf(fname, "%s/%s", dirname, dent->d_name);


clamd/thrmgr.c::work_queue_add(); lines 50-52:

 work_item = (work_item_t *) mmalloc(sizeof(work_item_t));
 work_item->next = NULL;
 work_item->data = data;


V. CONCLUSION
=============
Overall, we are impressed with the quality of code in ClamAV.  Many
times when we thought we found the improper use of some lib call, we
quickly found out that there were other explicit checks in place to
prevent an exploit.  The code is also very consistent in its use of
calls for memory allocation, string concatenation, and print
formatting.

We do have two recommendations for the development team, however:
  1. Formalize the release process better.
     Any new code added close to release time deserves more scrutiny
     than normal.  Perhaps a buddy-system for review would be wise.

  2. Avoid calls to system() whenever possible.  
     If there is another way to get the job done with some form of
     fork()/exec(), use that instead.  Even if the data currently being
     sent to system() isn't tainted, that could change in a future
     revision of the codebase.

Finally, we would like to thank the ClamAV team for providing the
community with a very useful tool, and such great support.


VI. REFERENCES
==============

1. ClamAV Team. "ClamAV: Abstract".  Accessed: 2005-04-12
   http://www.clamav.net/abstract.html

2. ClamAV Team. "clamav-0.83.tar.gz". 
   Released: 2005-02-13.  Hosted by SourceForge.net.
   http://prdownloads.sourceforge.net/clamav/clamav-0.83.tar.gz?download

3. Many Authors. "UNIX ON-LINE Man Pages".  Accessed: 2005-04-12
   http://www.mcsr.olemiss.edu/cgi-bin/man-cgi

4. Secure Software Inc.  "Rough Auditing Tool for Security".  
   Accessed: 2005-04-12
   http://freshmeat.net/projects/rats/

5. David A. Wheeler. "Flawfinder". June 2004.  Accessed: 2005-04-12
   http://www.dwheeler.com/flawfinder/

6. Michal Zalewski.  "mangleme".  2004.  Accessed: 2005-04-12
   http://freshmeat.net/projects/mangleme/

7. "ned" <nd@felinemenace.org>. "htmler.py".  2004.  
   Accessed: 2005-04-12
   http://felinemenace.org/~nd/htmler.py

8. William H. Press, Brian P.  Flannery,  Saul  A.  Teukolsky, William
   T. Vetterling; New York: Cambridge University Press, 1992 
   (2nd ed.,p. 277)