CSE 643: Computer Security (Fall 2019)
25 Aug 2019Basic Information
- Instructor: Prof. W. Du
- Classroom :
- 020 Dineen Hall (08/26/2019 - 09/11/2019)
- Lyman Hall 132 (from 09/16/2019 on)
- Time: Monday, Wednesday 14:15-15:35
- Class website:
- Textbook:
- Computer & Internet Security: A Hands-on Approach (2nd Edition) [amazon]
- Computer Security: A Hands-on Approach 2nd Edition (2nd Edition) [amazon]
The second book is a subset of the first one. However, this class will only cover the content of the second book.
- Office hours:
- Prof. W. Du: Tuesday 14:00-15:00 @ CST 4-285
- Misc:
- The lab environment can be acquired on this page. It runs on VirtualBox VM. Only version 6.0.4 is advised, using other ones may cause compatibility issues.
Lab 1: Environment Variables and Set-UID Programs
User id & privilege
In Linux, each user is associated with a unique user id. The root user (who has the greatest privilege) has user id 0. The permission granting mechanism verify a user’s id, instead of name when giving out permissions. Therefore, if one can somehow change his/her user id to 0, he/she can execute the rights of the root.
In Linux, each running process has three types of user ids, which is introduced here in detail. We are only interested in real user id and effective user id. Real user id is the id of owner of this process; effective user id is the id of the user whose rights will be given to the process. Usually, real user id is equal to effective user id.
One can finds out his/her own user id by typing id
in the terminal.
[09/02/19]seed@VM:~$ id uid=1000(seed) gid=1000(seed) groups=1000(seed),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare)
Linux file permission
Linux uses 10 bits to describe file permissions. If we type command ls -l
in terminal, we can get the permission information of the files in current folder.
[09/02/19]seed@VM:~$ ls -l total 1688 drwxrwxr-x 4 seed seed 4096 May 1 2018 android drwxrwxr-x 2 seed seed 4096 Jan 14 2018 bin drwxrwxr-x 2 seed seed 4096 Jan 14 2018 Customization drwxr-xr-x 3 seed seed 4096 Aug 31 10:37 Desktop drwxr-xr-x 2 seed seed 4096 Jul 25 2017 Documents drwxr-xr-x 2 seed seed 4096 May 9 2018 Downloads -rw-r--r-- 1 seed seed 8980 Jul 25 2017 examples.desktop -rw-rw-r-- 1 seed seed 1661676 Jan 2 2019 get-pip.py drwxrwxr-x 3 seed seed 4096 May 9 2018 lib drwxr-xr-x 2 seed seed 4096 Jul 25 2017 Music drwxr-xr-x 3 seed seed 4096 Jan 14 2018 Pictures drwxr-xr-x 2 seed seed 4096 Jul 25 2017 Public drwxrwxr-x 4 seed seed 4096 May 9 2018 source drwxr-xr-x 2 seed seed 4096 Jul 25 2017 Templates drwxr-xr-x 2 seed seed 4096 Jul 25 2017 Videos
In the output:
- 1st column will give detailed information regarding file permission,
- 2nd column will tell you about the number of links to the file,
- 3rd and 4th columns are associated with owner and group of the file,
- 5th column will be displaying the size of the file in bytes,
- 6th column will display the recent time and date at which the file was modified, and the last and 7th column is the actual file/directory name.
The first column indicates permission. We can split them into four segments as follows.
d | rwx | rwx | rwx |
---|---|---|---|
if directory | permission of owner | permission of owner’s group | permission of others |
Notice that the meaning of each bit is different for files and directories. For normal files, “r” indicates the right to read; “w” indicates the right to write and “x” indicates the right to execute a file. For directories, “r” indicates the right to list its contents; “w” indicates the right to modify the content inside and “x” indicates the right to cd
into the directory. We can compress the last nine file permission bits into three decimal digits. The way of doing it is to treat “rwx” as a three-bit binary representation of a decimal number. For example, “rwxrwxr-x” can be expressed as “775”.
There are also three other special permission bits that are not displayed by default. This page describes all three of them in detail. Here, we are only interested in the Set-UID
bit. For a program, if this bit is set, then when someone runs the program, its effective user id would equal to its owner’s, instead of its runner’s (i.e. the effective user id is set to be the owner’s user id, instead of assigning real user id).
In order to make a program Set-UID
program, we can use the following command:
[09/02/19]seed@VM:~$ chmod 4755 progname
The “4” preceding normal file permissions indicates toggling the Set-UID
bit only (in this demonstration, the command also changes the file permission to 755). When we run ls
in a folder, Set-UID
programs will be highlighted with red background. For example:
[09/02/19]seed@VM:~$ cd /bin [09/02/19]seed@VM:/bin$ ls bash date hostname mountpoint ntfswipe sleep uname bash_shellshock dd ip mt open ss uncompress bunzip2 df journalctl mt-gnu openvt static-sh unicode_start busybox dir kbd_mode mv pidof stty vdir bzcat dmesg kill nano ping su wdctl bzcmp dnsdomainname kmod nc ping6 sync which bzdiff domainname less nc.openbsd plymouth systemctl whiptail bzegrep dumpkeys lessecho netcat ps systemd ypdomainname bzexe echo lessfile netstat pwd systemd-ask-password zcat bzfgrep ed lesskey networkctl rbash systemd-escape zcmp bzgrep efibootmgr lesspipe nisdomainname readlink systemd-hwdb zdiff bzip2 egrep ln ntfs-3g red systemd-inhibit zegrep bzip2recover false loadkeys ntfs-3g.probe rm systemd-machine-id-setup zfgrep bzless fgconsole login ntfs-3g.secaudit rmdir systemd-notify zforce bzmore fgrep loginctl ntfs-3g.usermap rnano systemd-tmpfiles zgrep cat findmnt lowntfs-3g ntfscat run-parts systemd-tty-ask-password-agent zless chacl fuser ls ntfscluster rzsh tailf zmore chgrp fusermount lsblk ntfscmp sed tar znew chmod getfacl lsmod ntfsfallocate setfacl tempfile zsh chown grep mkdir ntfsfix setfont touch zsh5 chvt gunzip mknod ntfsinfo setupcon true cp gzexe mktemp ntfsls sh udevadm cpio gzip more ntfsmove sh_backup ulockmgr_server dash hciconfig mount ntfstruncate sh.distrib umount
When the file permission details are displayed, it can be seen that an “s” bit is present in place of the usual “x” bit.
[09/02/19]seed@VM:/bin$ ls -l mount
-rwsr-xr-x 1 root root 34812 Dec 16 2016 mount
Password dilemma & Set-UID programs
In Linux, the password entries of each user are stored in /etc/shadow
, which is owned by root user with permission 640. That is to say, normal users cannot read or modify this file. However, normal users should be able to change their own passwords freely. But if we allow them to access /etc/shadow
, they may break the security of the system by changing other user’s password entry. This is the password dilemma: if we enable users to access the password file, they may compromise system security; but if no access is allowed, they are unable to change their own passwords.
Linux resolves this dilemma by implementing Set-UID
mechanism, which was initially invented by Dennis Ritchie at Bell Labs [wiki]. As it is mentioned above, Set-UID
programs would have their owner’s privilege. In Linux, the password utility passwd
is a Set-UID
program owned by root. That is, when a normal user run the program, the program itself would have root privilege, which means it can modify the password file (also any other file on the system). But (ideally) this does not bring any security hazard because the passwd
program itself is programmed to handle the password file with grace and the password file only. That is, it only updates the password file when it finds out the the user provides the correct old password, and it would only change the user’s entry, leaving other ones untouched. It will never read or modify any other unrelated files.
Potential attacks on Set-UID programs
Because Set-UID
programs allows a normal user to gain (limited) root privilege, attacks on them become very appealing. We can first analyze the attack surface of Set-UID
programs. The attack surface of a software environment is the sum of the different points (the “attack vectors”) where an unauthorized user (the “attacker”) can try to enter data to or extract data from an environment. For Set-UID
programs, the attack surface is the sum of places where the program gets its inputs.
User inputs
If a Set-UID
program fails to sanitize inputs from a user, it may create a security loophole. In Linux, the user information is stored in /etc/passwd
, where each line represents one user. Two sample lines are shown as below. There are multiple fields in each line, which are separated by colons. The detailed explanation of each filed can be seen here. Now we are only interesed in the last field, which specifies the default shell program of the user, i.e. the first program that would be run after logging in.
[09/02/19]seed@VM:~$ sudo cat /etc/passwd root:x:0:0:root:/root:/bin/bash seed:x:1000:1000:seed,,,:/home/seed:/bin/bash
The Linux provides a utility called chsh
to modify the default shell program. If it fails to sanitize user input properly, a user can provide a new shell program string with a line break character inside, effectively creating a new user in the system. By assigning 0 to user id, the normal user can even plat a new root account in the system.
System inputs
Sometimes even system inputs can be the point of attack on Set-UID
programs. Suppose a privileged Set-UID
program needs to write some file in /tmp/abc
. Because the /tmp
folder is globally accessible, one can create a symlink in /tmp
with the name abc
, but actually points to /etc/shadow
. When the Set-UID
program writes its output, it actually destroys the password file.
Environment variables
If a Set-UID
program needs to read values of environment variables directly, it is very import to sanitize the values. However, threats from environment variables can occur even when the Set-UID
program is not reading them directly.
-
When invoking other programs.
Problems can occur when a
Set-UID
program is calling external procedures withsystem()
. For example, a program can callsystem(date)
to show the date. What this command does is to tell thebash
program to search for a program nameddate
in directories specified byPATH
environment variable. However, if the user changes thePATH
variable to a folder that includes his/her own version ofdate
, the wrong version ofdate
gets invoked instead. If the program happends to be a privilegedSet-UID
program, then the wrongdate
program gets executed with root privilege. -
During dynamic linking.
In Linux, the dynamic linker finds dynamic libraries in folders specified by
LD_LIBRARY_PATH
andLD_PRELOAD
. If one changes these two environment variables to directories containing their own malicious versions of dynamic libraries, they can change the behavior of the program.
Capability leaking
It is very common for a privileged program to relinquish the privileges after completing certain operations. A common mistake made by programmers is capability leaking. It happens when the programmer forgets to release some resource gained when the program still owns the privileges, even tough he/she had explicitly downgraded the program’s rights. For example, consider the pseudocode below.
At the first glimpse, everything looks fine. However, that is not the case. When calling fork
, the child process gets a copy of all parent process’s file handles, which includes the one that has access to the important file. Although the programmer has explicitly relinquished the privileges before forking, indicating that he/she does not want the child process to access the file, the attempt obviously failed. Therefore, it is important for programmers to relinquish privileged resource as soon as possible, i.e. in this case, to close the file before forking.
Invoking other programs
We have discussed using system()
to call an external procedure above. Now this concept is to be elaborated. When calling an external procedure with system(command)
, it actually calls /bin/sh -c command
. For a privileged program, this can be very sloppy. Consider a program that scans virus on any given file/directory on the system. Because it has to be able to open any file on the system, it is a privileged Set-UID
program. Suppose this program takes path
as input from users, and calls system("ls " + path)
to check the content of the folder, a normal user can easily run a root shell by passing "/abc;/bin/sh"
as the path. Here, /abc
is just a path name that is not relevant. What matters is the semicolon, which acts as a separator between two commands in shell syntax, and /bin/sh
, which invokes the shell. Together with the environment variable vulnerability introduced above, it should be noted that system()
is a very dangerous way of invoking other programs.
A safer approach is to use execve
function. It takes three arguments: the filename of the executable; the arguments passed to the executable and the environment variables defined during this call. Instead of using /bin/sh -c command
, this function uses an internal system call to invoke the program, which eliminates many potential safety hazards. In this case, "ls"
would be passed as the first argument, and the path
would be passed as the second argument. If one tries to pass "/abc;/bin/sh"
to the virus scanner program, this entire string will be recognized as a path argument, which stops the attack. The ability to specify environment variables also provides stronger security.
The principle of isolation
The difference between system()
and execve()
reflects the principle of isolation in computer security, which states that data should be clearly isolated from code. In the virus scanner example above, "ls"
is code, which specifies what program we would like to call and should not be changed; path
is data, which determines which folder to scan. The system()
approach violates this principle by blending data the code together into one string, which introduces many security problems.
The principle of least privilege
The principle of least privilege is introduced by J.H. Saltzer and M.D. Schroeder in 1975. It states that every program and every privileged user of the system should operate using the least amount of privileges necessary to complete the job. In Linux, privileged Set-UID
programs violates this principle because they have the power of the root user, which has every possible privilege. Modern operating systems like Android provide find-grained privileges. When we open the application settings page on Android phones, we can set the privilege for a program to access location, camera, microphone, etc.
Another implication derived from this principle is that if a privileged program does not need some privileges for part of its execution, it should disable the privileges either temporarily or permanently, depending on whether this privilege needs to be reused later on. For example, if we close the privileged file handle as soon as possible in the capability leaking case, then this problem will not occur.
Lab 2: Buffer-Overflow Vulnerability
Process memory layout
Thanks to virtual memory, each Linux process has its own address space. The layout of the address space can be seen as below.
There are mainly five segments:
- Text segment: stores the executable code of the program. This segment is usually read-only.
- Data segment: stores (initialized) global and static variables.
- BSS segment: stores uninitialized global and static variables.
- Heap: provides space for dynamic memory allocation. The heap grows from low address to high address.
- Stack: stores local variables and maintains the structure of the program. The stack grows from high address to low address.
x86 stack layout
Each function has its corresponding stack frame, which stores its local variables and other important status. When a new function is called, a new stack frame is created, and the stack grows (from high address to low address).
void bar(){
// <--here
}
void foo(){
bar();
}
int main(){
foo();
return 0;
}
For the following above, when the execution goes inside bar()
, the stack frames can be illustrated below.
Inside the stack frame are arguments, local variables and other values that keep the program running correctly.
int func(int a, int b){
int x, y;
}
For the func(int, int)
above, its stack frame is shown as below. The arguments of this function are at the top of the stack. The return address stores where the execution should continue after func()
returns. The %ebp
register (aka “stack base pointer”) points to the base of the stack frame, allowing the program to access saved %ebp
, return address and its arguments by a fixed offset. The saved %ebp
value stores the former %ebp
of func()
’s caller. When func()
returns, the %ebp
value will be reverted to the saved value. Below %ebp
are the local variables of func()
. Its layout is completely determined by the compiler, so we do not know where x
and y
are exactly. We only know that they are roughly in that region.
There is also another “stack pointer” that points to the top of the stack, which is stored in register %esp
.
Buffer overflow attack
Copying data to buffer
Usually, we use the strcpy(char *dest, const char *src)
function to copy strings from source to destination. We do not need to specify how long src
is, since the copying is automatically terminated when a '\0'
is encountered in src
.
Buffer overflow
Since the strcpy
function does not compare the length of src
and dest
, it may be the case that the length of src
exceeds the maximum length of dest
. This is called buffer overflow. This is dangerous, because depending on the layout, other local variables can be overwritten. In the worst case, even the saved %ebp
and return address will be modified. When the return address is changed, the program will almost always malfunction since the execution flow is broken. Usually, there are four consequences:
- Jumping to a invalid address: the memory protection mechanism of OS prevents the program from accessing unallocated memory space.
- Jumping to a protected address: the target memory space is allocated, but it is protected (e.g. it is reserved for the kernel). Usually these memory access violations will causes segmentation faults.
- Invalid instruction: the target memory space is allocated, and it is accessible. However, the data there is not a valid machine instruction.
- Normal execution: the target memory space is allocated, and it is accessible. The data there happens to be valid machine instructions.
By making use of buffer overflow carefully, an attacker can deliberately change the return address so that it points to some malicious code in another memory location, thus compromising the logic of the program.
The vulnerable program
Since analyzing complicated programs for buffer overflow vulnerabilities is difficult, we would try our attacks with the following program. The program has a buffer overflow problem, since we are copying str
, a string of maximun length of 517 into buffer
, whose maximum length is 24.
/* stack.c */
/* This program has a buffer overflow vulnerability. */
/* Our task is to exploit this vulnerability */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int bof(char *str)
{
char buffer[24];
/* The following statement has a buffer overflow problem */
strcpy(buffer, str);
return 1;
}
int main(int argc, char **argv)
{
char str[517];
FILE *badfile;
badfile = fopen("badfile", "r");
fread(str, sizeof(char), 517, badfile);
bof(str);
printf("Returned Properly\n");
return 1;
}
Now we analyze how to exploit the buffer overflow vulnerability of this problem. The bof
function’s stack frame is shown in the figure below. When we call the strcpy
function, it copies data from str[0]
to buffer[0]
, str[1]
to buffer[1]
, etc. If the length of str
is longer than 24, it can be seen that we will tamper the variables above buffer
, namely saved ebp, return address and arguments. We may also modify the main
function’s stack frame.
Our goal is to modify the return address slot so that when the execution of bof
finishes, we can mislead the program to “return” to our own malicious code segment. It is worth noticing that our malicious code segment is transferred into the victim computer using the input buffer (in this case, a file named badfile
; but it could be input acquired from standard input). This immediately brings us the two major challenges of a buffer overflow attack:
- How to determine the offset of the return address? Since we need to overwrite the old return address with our own desired one, we need to guess where to place the return address value in our input buffer. If there is a misalignment, the attack will not succeed.
- How to determine which address to return to? The return address will point to the absolute location of the code that will be executed next. In order to trick the program into running our own code, we have to know the absolute location of where our input will be put into the memory.
It is very difficult to meet these two requirements when we do not have access to the vulnerable program’s source code. However, as we can see that there are ways to make these two conditions “fuzzy”, i.e. sometimes we can launch the attack successfully without knowing the exact address.
Conducting a buffer overflow attack
To conduct a (toy-level) buffer overflow attack, we need to follow these steps:
-
Turn off countermeasures
There are a lot of countermeasures that are already implemented in the Linux system to defend against buffer overflow attack. We need to disable all of them to make our life easier.
-
Turn off address space randomization
Address space randomization will shuffle the address space, which makes it more difficult to guess the correct value of return address. It can be turned off using the following command:
[09/13/19]seed@VM:~$ sudo sysctl -w kernel.randomize_va_space=0 kernel.randomize_va_space = 0
-
Turn off compiler’s protections
We need to compile
stack.c
with the following command:[09/13/19]seed@VM:~/.../lab2$ gcc -z execstack -fno-stack-protector -o stack stack.c
The
-z execstack
flag disables non-executable stack protection. Because our malicious code is stored inside the stack frames, this mechanism will stop the code from executing.The
-fno-stack-protector
flag disables stack protector, which is a special code segment added to the program to detect if there is a stack overflow. Again, if it is not canceled, our attack will not succeed.
-