ClamAV Code Audit Report ======================== By Kevin Amorin & Timothy Morgan Work done as part a of course in network security at Northeastern University in March, 2005. This is being published for two reasons: - for hope that it will be useful to others in determining the relative security of the ClamAV package - to make public the recommendations to ClamAV developers on coding practices TABLE OF CONTENTS ================= I. Introduction II. Scope of Audit A. Common libc Bugs B. File Race Conditions C. Resource Exaustion & Denial of Service III. Tools A. Source Code Auditing B. Input Fuzzing IV. Findings A. Weak RNG Seeding B. Local Privilege Escalation on Mac OS C. Sigtool Password Overflow D. Low-Memory Null Pointer Dereferences V. Conclusion VI. References I. INTRODUCTION =============== This report describes the focus, methods, and findings of a source code audit perfomed by Timothy Morgan and Kevin Amorin on the open source anti-virus project ClamAV. A description of the ClamAV package, from the project website[1]: "Clam AntiVirus is a GPL anti-virus toolkit for UNIX. The main purpose of this software is the integration with mail servers (attachment scanning). The package provides a flexible and scalable multi-threaded daemon, a command line scanner, and a tool for automatic updating via Internet." The release version reviewed in this audit was 0.83, and can be obtained from [2]. II. SCOPE OF AUDIT ================== The authors of this report are not experts at code review. That is the real purpose of this project: to practice auditing on a real-world application in order to learn more. In addition, the duration of the audit was limited to a few of weeks of part-time work. Therefore this audit can not be considered in any way exhaustive. For this reason, we describe the specific types of issues we focused our search on. II.A. COMMON LIBC BUGS ---------------------- The audit focused mainly on common buffer mis-management bugs, and other dangerous uses of libc calls. See [3] for more information on the following system calls. strcpy & strcat sprintf The obvious overflow potential associated with the use of these functions were addressed in this audit. printf fprintf et.al. In our search, we looked for improper uses of string formats in this family of functions. Allowing unchecked user-controlled data to be included in a string format field can allow an attacker to read and overwrite arbitrary portions of process memory. strncpy strncat snprintf While these functions are much safer to use than their non-checking counterparts, there are a number of problems that can arise. For instance, if any calculation of n in the input could possibly be negative, the cast to an unsigned size_t can result in a length larger than any buffer can realistically get, which makes this checking useless. In addition, off-by-one errors can occur with strncpy and strncat, which could lead to denial of service attacks, or even code execution in the worst case. system The use of system is very dangerous if attackers can control strings being sent to it. Since system does the equivalent of "/bin/sh -c" on input, it is easy for attackers to inject arbitrary commands if it isn't used very carefully. II.B. FILE RACE CONDITIONS -------------------------- Because ClamAV must be able to extract large files and large numbers of files from various compressed formats, it often needs to store temporary files on disk. For this reason, we spent some time trying to understand how it creates these temporary files, and if the methods are secure against symlink attacks. Besides looking for common misuses of tmpnam, we tried to understand exactly where any race conditions existed, and if they were exploitable by lower-privileged users based on the permissions of files and directories. II.C. RESOURCE EXAUSTION & DENIAL OF SERVICE -------------------------------------------- While we didn't focus as much on this area, there was some effort put into finding situations where an attacker could send files to clamd that might cause a denial of service. Virus scanners, due to the type of work they need to do, often have problems with compressed files and other combined file formats. Some good tests to throw at a virus scanner include: - Sparse compressed files If a very large file, with all 0's is compressed, it becomes much smaller than its original size (often 1/10th of the original). If a virus scanner attempts to uncompress the entire thing into memory, it will quickly run out of that resource. (The term "spare file" refers to a Unix mechansim to store large, data-less files. This can be used to quickly create very large virtual files containing all 0's.) - Heavily nested files Today's virus scanners all handle compressed files files pretty well, by extracting them and scanning the files inside for viruses. However, what happens when a .zip file is sent compressed inside of a .rar archive? How about a file compressed 10 times with different file compression formats? Sending a heavily nested compressed file, where each level has two seperate compressed files could result in an exponential number of files which need decompression. How would a virus scanner handle this potentially massive number of files? Once again, we executed some variations of these tests, but didn't focus on this area. III. TOOLS ========== III.A. SOURCE CODE AUDITING --------------------------- R.A.T.S. Rough Auditing Tool for Security[4] is a program which looks for common programming mistakes and risky code which could lead to vulnerabilities. We found this tool to be a good starting point, as it found every possible dangerous libc call, but it also found a large number of false positives which required weeding through. Flawfinder We also used Flawfinder[5], which does analysis very similar to R.A.T.S. It is better at some things, and worse at others. Neither does terribly deep code analysis, and thus outputs many false positives. III.B. INPUT FUZZING -------------------- mangleme/htmler.py Mangleme[6] is an HTML input fuzzer originally written to test popular browsers against malformed web pages. This was used by the author to uncover a number of holes in multiple browsers. htmler.py[7] is simply a Python port of mangleme. Because ClamAV does contain code to parse HTML documents for link extraction, we decided to test it against this tool. In multiple tests we created anywhere between 10,000-100,000 files on the local system and used the local scan option of ClamAV to test the html parsing functions. The files contained missing HTML tags, malformed tags, bad URL locations, and binary data. In our tests we found the HTML parser worked flawlessly. We were hoping to find a potential DoS attack in the URL fetching code. We did find that ClamAV only fetched the first 5 URLs in the file, as a DoS protection mechanism. IV. FINDINGS ============ IV.A. WEAK RNG SEEDING ---------------------- In our search for file creation race conditions, we came across a function which is relied upon to generate filenames for temporary files and directories. The full function, from the file libclamav/others.c, is included below: unsigned int cli_rndnum(unsigned int max) { struct timeval tv; gettimeofday(&tv, (struct timezone *) 0); srand(tv.tv_usec+clock()); return rand() % max; } This function re-seeds the random number generator *every* time it is called, based on the current time of day's microseconds, and the amount of CPU time the process has used. Before finishing, the function generates a random number, takes its modulus, and returns some number between 0 and max-1. First of all, re-seeding a random number generator for every number needed is slow, and probably isn't very secure. Secondly, the parameters sent to srand() are *not* so unpredictable. The clock() call can be estimated by a local attacker via command line utilities. The tv.tv_usec value is hard to predict (around 20 bits of entropy on a tested Linux system), but may not have a lot of entropy on other platforms. Finally, the man page on rand() [3] indicates that simply taking the modulus of it's output isn't secure on all Unix systems. (It references [8] for this information.) Apparently some rand() implementations do not have very random low-order bits. The high-order bits should be used instead. To ClamAV's (and the developers') credit, they do make an effort to mask this deficiency in their generation of pseudo-random filenames. Each filename is based on the MD5 on half of the previous output of the generation routine plus more random data (using the above function). Half of the output is then used as the filename, and the other half is stored for use in future calls. In order to guess future filenames, one would need to be able to predict the output of the random function above, and know at least one hidden MD5 output. For a long-running process, such as clamd, a local user would have a very hard time guessing future temporary filenames. Perhaps in the situation where a clamscan was run via a cron job, this code would be vulnerable to attack. Since the random number generation is done in a tight-loop, and is re-seeded every time, the very first call in any given file name generation is where most of the entropy would lie. The time it takes for clamscan to start up can't be considered secret, and this is what the clock() call would be based on. However, ClamAV seems to be very careful about checking for the existance of files and the success or failure of temporary directory creation. This makes the window of opportunity for an attacker incredibly slim, if it exists at all. It is important to note, however, that the underlying random number generation function doesn't seem to be very safe, and is used in at least one other place in the code base. It is recommended that the seeding of the RNG be done at startup only, and be based on entropy from the base OS (for instance /dev/random on Linux). If a good source of entropy is not available on specific platforms, then something similar to what is currently implemented could be used, but perhaps with some additional sources entropy added in. IV.B. LOCAL PRIVILEGE ESCALATION ON MAC OS ------------------------------------------ Under the Mac OS file system (HFS) files are saved as to parts data and resource fork. In ClamAV versions 0.80rc4 and later, support was added to copy both the data and the resource fork when moving a virus infected file. The mechanism they used was the Mac local system utility ditto. While there isn't a security issue with using the "ditto" command itself, the system call they use to execute it is insecure: #ifdef C_DARWIN /* On Mac OS X use ditto and copy resource fork, too. */ char *ditto = (char *) mcalloc(strlen(src) + strlen(dest) + 30, sizeof(char)); sprintf(ditto, "/usr/bin/ditto --rsrc %s %s", src, dest); if(system(ditto)) { free(ditto); return -1; } There memory allocation for the call is also secure. They do not however check the filename for shell special characters. If a file name contains an embedded shell command the system will execute it as the ClamAV current UID. An example attack is as follows: Download a test virus http://www.eicar.org/download/eicar.com And rename it like so: $ mv eicar.com \;echo\ \"test\"\; If the clam user does not have permissions to remove the file it will try and copy the file and the resource fork via the ditto system call. The command it will execute is: system("/usr/bin/ditto -rsrc ;echo "test"; /tmp/;echo "test" "); The shell will interpret the ';echo "test"; 's a separate command and execute it. The following is output of the test. ./;echo "test";: Eicar-Test-Signature FOUND usage: ditto [ ] src [ ... src ] dst are any of: -v print a line of status for each src copied -V print a line of status for every file copied -X do not descend into directories with a different device ID -c create a CPIO archive at dst -x unpack the CPIO archives at src... -z CPIO archives are compressed -k archives are PKZip format --keepParent parent directory of src is embedded in dst --arch archVal fat files will be thinned to specified archVal multiple -arch options can be specified archVal should be one of "ppc", "i386", etc --bom bomFile only files present in the specified bom are copied --rsrc copy preserving resource data --sequesterRsrc copy resources via polite directory (PKZip only) test We have informed the ClamAV development team about this issue, and they have confirmed the problem. We are working with them now on a fix, and a public release of the vulnerability. UPDATE: ClamAV has fixed this issue. SCN advisory available at: http://www.sentinelchicken.com/advisories/clamav/ IV.D. SIGTOOL PASSWORD STACK BUFFER OVERFLOW -------------------------------------------- The only significant buffer mis-management issue we discovered was an overflow in the utility "sigtool". According to the man page: "sigtool can be used to generate MD5 checksums, convert data into hexadecimal format, and build/unpack CVD databases. It's also to verify digital signatures of databases and list virus signature names." We found an overflow in the password prompting code. This code only seems to be used when a virus database developer is trying to sign their own virus databases. For this reason, the bug should affect *very* few users of this software. Even if it affected the broad user-base, it isn't clear that this would even lead to privilege escalation. We will describe it here, for completeness anyway. In the file sigtool/sigtool.c, there is a function called "getdsig". The following code snippet illustrates the problem: char *getdsig(const char *host, const char *user, const char *data) { char buff[300], cmd[100], *pass, *pt; [...snip...] memset(cmd, 0, sizeof(cmd)); pass = getpass("Password:"); sprintf(cmd, "ClamSign:%s:%s:", user, pass); It isn't difficult to determine that if one were to type in a password longer than 100-11-1-strlen(user) characters, an overflow of the cmd buffer would occur. Once again, this isn't going to affect users, unless they sign their own virus databases *and* sigtool is somehow run with elevated privileges. IV.E. LOW-MEMORY NULL POINTER DEREFERENCES ------------------------------------------ While the code in ClamAV is very good about allocating enough memory for its strings, it isn't always as good at making sure it's memory allocation calls return successfully. Under memory-constrained conditions, the OS will return a null pointer if the allocation failed. This could lead to ungraceful shutdowns of clamd, possibly causing a denial of service situation. It is important to note however, that ClamAV does employ a number of techniques to resist memory exaustion. We did not have time to audit these mechanisms to determine if any of them could be circumvented. Some examples of code where memory allocation return values are not checked are show below: libclamav/others.c::cl_settempdir(); lines 337-338: var = (char *) cli_malloc(8 + strlen(dir)); sprintf(var, "TMPDIR=%s", dir); libclamav/others.c::cli_gentemp(); lines 429-430: fname = cli_calloc(strlen(dirname) + strlen(dent->d_name) + 2,\ sizeof(char)); sprintf(fname, "%s/%s", dirname, dent->d_name); clamd/thrmgr.c::work_queue_add(); lines 50-52: work_item = (work_item_t *) mmalloc(sizeof(work_item_t)); work_item->next = NULL; work_item->data = data; V. CONCLUSION ============= Overall, we are impressed with the quality of code in ClamAV. Many times when we thought we found the improper use of some lib call, we quickly found out that there were other explicit checks in place to prevent an exploit. The code is also very consistent in its use of calls for memory allocation, string concatenation, and print formatting. We do have two recommendations for the development team, however: 1. Formalize the release process better. Any new code added close to release time deserves more scrutiny than normal. Perhaps a buddy-system for review would be wise. 2. Avoid calls to system() whenever possible. If there is another way to get the job done with some form of fork()/exec(), use that instead. Even if the data currently being sent to system() isn't tainted, that could change in a future revision of the codebase. Finally, we would like to thank the ClamAV team for providing the community with a very useful tool, and such great support. VI. REFERENCES ============== 1. ClamAV Team. "ClamAV: Abstract". Accessed: 2005-04-12 http://www.clamav.net/abstract.html 2. ClamAV Team. "clamav-0.83.tar.gz". Released: 2005-02-13. Hosted by SourceForge.net. http://prdownloads.sourceforge.net/clamav/clamav-0.83.tar.gz?download 3. Many Authors. "UNIX ON-LINE Man Pages". Accessed: 2005-04-12 http://www.mcsr.olemiss.edu/cgi-bin/man-cgi 4. Secure Software Inc. "Rough Auditing Tool for Security". Accessed: 2005-04-12 http://freshmeat.net/projects/rats/ 5. David A. Wheeler. "Flawfinder". June 2004. Accessed: 2005-04-12 http://www.dwheeler.com/flawfinder/ 6. Michal Zalewski. "mangleme". 2004. Accessed: 2005-04-12 http://freshmeat.net/projects/mangleme/ 7. "ned" . "htmler.py". 2004. Accessed: 2005-04-12 http://felinemenace.org/~nd/htmler.py 8. William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling; New York: Cambridge University Press, 1992 (2nd ed.,p. 277)