Secure Computing: Buffer Overruns and More (Explanations, Examples, and Resources)


From: "Ted Pavlic" <ted@tedpavlic.com>
To: <math-thinking@cs.geneseo.edu>; "Prof. K Viswanath" <kvsm@uohyd.ernet.in>
Subject: Re: Secure Computing
Date: Thursday, October 16, 2003 10:09 AM

Professor Viswanath --

    The most common "security vulnerability" seen in software today involves
a lack of simple checking for buffer overruns. Before talking about exactly
what this means to textbooks and what other errors are common, I want to
focus on the buffer overrun. I invite you to search for "security buffer
overrun" on google:

http://www.google.com/search?q=security+buffer+overrun

You will find that a tremendous amount of documented security
vulnerabilities come from this simple flaw: an unchecked buffer being filled
with data well beyond its boundaries such that other more vital information
can be overwritten and perhaps later executed. Buffer overruns form the bulk
of the security problem on the Internet. In fact, a "content strategist for
MSDN with focus on Security" manages a personal website called
www.bufferoverrun.net that publishes news of a variety of MS security
problems.

    This has created a whole new market of analysis packages that look
specifically for "buffer overruns" or "buffer overflows." These packages
also check for other errors like the ones I'll bring up later, but they make
a special note to check for buffer overruns. One example is Coverity:

http://www.coverity.com/

    If you search CERT's list of advisories:

http://www.cert.org/advisories/

You will find a catalog of security vulnerabilities, of which a large part
is simply buffer overruns. Most buffer overrun notifications have a title
that is something like:

"Buffer Overrun in ______________ Could Cause Arbitrary Code to be Executed"

    For more information about the dangers of buffer overruns/overflows, see
this simple definition:

http://searchsecurity.techtarget.com/sDefinition/0,,sid14_gci549024,00.html

    Searching for "buffer overrun" on www.zdnet.com produces a number of
very nice links. One such link is a nice whitepaper on this topic that can
be found at:

http://www.nextgenss.com/papers/bufferoverflowpaper.rtf

There are plenty of code examples there that are examined in depth. At the
end of the paper the code that exploits an old problem in Oracle is given
completely. The other much more simple code example leads us to what could
be found in a common computer programming textbook.

#include <stdio.h>
int main()
{
    char garbage[100];
    printf("Enter some characters: ");
    gets(garbage);
    printf("You typed %s\n", garbage);
    return 0;
}


    I assume that you might expect to see this in a textbook. It a simple
demonstration of the wonderful gets procedure that takes a character pointer
and fills it with characters from the standard input. This is often one of
the first topics covered in C textbooks even though the man page for gets
has this entry in it that includes the phrase, "Never use gets()":

BUGS
       Never use gets().  Because it is impossible to tell  with-
       out knowing the data in advance how many characters gets()
       will read, and because gets() will continue to store char-
       acters past the end of the buffer, it is extremely danger-
       ous to use.  It has been used to break computer  security.
       Use fgets() instead.


    As the whitepaper explains, using gets here can allow a malicious user
program-level access to memory. This allows the user to be able to place
whatever she would like into memory to be executed later. This is
particularly bad when you consider how memory in subroutines is allocated on
a stack. In these cases, a buffer overrun will not cause a segmentation
fault and may not ever cause a problem until AFTER the procedure returns.
The reason why fgets is so much better is because it takes as a parameter
the maximum length of the buffer. It then makes sure that the buffer is not
ever overrun and leaves any additional characters on the input stream.

    However, tools like fgets typically aren't covered in the bulk of
textbooks. A textbook takes a very redunctionist view to programming -- 
teach the components and hope that the student has the prescience to put
everything together in the right place the first time without any speed
bumps along the way. After all, if a student was just becoming familiar with
the idea of null-terminated character strings, perhaps involving this
discussion of memory is not quite appropriate. This may show to be
irresponsible as that student may feel ready to produce software before she
has matured completely.

    Now that you have that in mind, I invite you to look through your own
textbooks. I'm sure you'll find plenty of examples of the use of procedures
like gets. Another one to be careful about is strcpy. strcpy will copy one
string, regardless of its length, onto another. strncpy could be used
instead, but it is less efficient as strcpy and involves additional
complexity when teaching to the student. Plus, if strncpy is used
incorrectly, it can still be a major threat. There are, of course, times
when strcpy can be used without any harm; but valid preconditions have to be
met.

    And while thinking about buffer overruns, it's easy to transition to
considering things like null dereferences and memory leaks, but it may be
difficult to see how these affect security. Often systems that require tight
security are made up of multiple components that help to make the system as
a whole secure. Null pointer dereferences, dangling pointers, and memory
leaks can cause those programs to crash and possibly be restarted back into
an improper state. A software program that monitors incoming connections
into a computer and blocks those that are unwanted or a software program
that simply keeps a system safe from the entrance of viruses creates a
single point of failure for the security of the whole system. If a set of
actions causes a firewall to fall without the user knowing of it, a machine
might left naked on the Internet for days thus welcoming other attacks.

    Of course, these low level problems are not the only problems with
computer security. Already in the links I've given above I'm sure you've
seen major security problems not linked to buffer overruns, null
dereferences, memory leaks, or other such ugly implementation beasts. Many
of these problems are attributed to simply not understanding a software API
properly. For example, I often find students trying to mix different I/O
libraries expecting them to share their underlying information. Again,
borrowing from the gets man page, I see:

       It is not advisable to mix calls to input functions from
       the stdio library with low - level calls to read() for the
       file  descriptor  associated  with  the  input stream; the
       results will be undefined and very probably not  what  you
       want.


I've already mentioned before that strncpy could be helpful to prevent
buffer overruns, but if not understood could be just as poor of a choice as
strcpy.

    Another common problem I see, especially when students start threading,
is the insistence on using strtok, which I'm sure was something they found
in high school to be a useful tool even though they disregarded its man page
warnings:

BUGS
       Never use these functions. If you do, note that:

              These functions modify their first argument.

              These functions cannot be used on constant strings.

              The identity of the delimiting character is lost.

              The  strtok()  function  uses a static buffer while
              parsing, so it's not thread safe. Use strtok_r() if
              this matters to you.


    In reality, all of the problems above show misunderstanding of API of
some sort. If restrictions were put on when certain pieces of data could be
used and why and how, it could be proven that a piece of software was fairly
bulletproof.

    In the end, it all comes down to:Checking the size of your buffers
  
Using the return codes of your functions
  
Understanding the function of each of your individual components
Where the third topic really is a supertopic encompasing the first two.

    Changes need to occur in the education of software designers that allows
them to see the implications of their actions. Too many computer programmers
have learned by example and simply are not able to practice good design
because they do not know what that means. Developing component-based
software and understanding the point of using pre-conditions and
post-conditions to design software in a systems thinking environment is the
first step to ensuring that the software produced is not a security threat.

    Examples in textbooks need to be enhanced to diagram exactly what set of
inputs are allowed into the system and of that set which are expected and of
that expected set which outputs are associated with those. A secure system
would make the first two sets equivalent. A non-secure system leaves plenty
of unknown states not accounted for. A buffer overrun would be one of these
unknown states.

    I hope that's plenty of information for you. This is not meant to be
fully comprehensive; it just gives a few examples of the biggest problems in
computer security today.

    All the best --
    Ted Pavlic
    pavlic.3@osu.edu



> ``The textbook examples are riddled with vulnerabilities," Mr. Hernan
> noted.
> ``Computer science culture is based on --- build it, get it working and
> fix it
> later. We need a culture change away from the cowboy and toward the
> engineer."
>
> Can some one tell me what is a "security vulnerability" in a program?
> What are the programs and the text books referred to here?
>
> For any one interested the full article is at:
> http://www.nytimes.com/2003/09/29/technology/29SOFT.html?th
Common Security Problems :: Buffer Overruns and More

BACKGROUND