Python dictionary algorithm and hash table vulnerability

Hash function is taken from www.tutorialspoint.com Picture of a hash function is taken from www.tutorialspoint.com

Structure commonly used inside of most languages is a hash table or in the python dictionary (dict). It exists in Java, PHP, Ruby, and so on. We going to review dict structure under python and then to explain how this attack is used to DDOS.

Python dict

Dict or hash table (under different programming language) is an algorithm that goes in best case O(1) and worst-case O(N). N is several key-element inside of dict. This is example how dict under python works.

our_dict = {} 

So we create dict object that have 2 parts: value and key (this is just to show structue as example)

our_dict = { None: None,
             None: None,
             None: None,
             None: None
            } 

With the proper key, we can get value connected to that key. Example to add "apple" as key and value 3. Dict has 4 places.

>>> our_dict = {}
>>> our_dict["apple"] = 3
>>> hash("apple") % 4
2
>>> 

In our case place 2 (it start from 0 and it goes to 2, so third place)

our_dict = { None: None,
             None: None,
             "apple" : 3,
             None: None
            } 

So we do up to 50% places.

our_dict = { None: None,
             "kiwi": 4,
             "apple" : 3,
             None: None
            } 

And next one we add:

>>> hash("pear") % 4
1

but we have on place 1. Now we extend place 1 with list-like object.

our_dict = { None: None,
             [["kiwi": 4], ["pear": 7]],
             "apple" : 3,
             None: None
            } 

In short, each time we add value and it makes collision place is extend to list. When it goes full it makes double the current size and re-hashed key/values. This happens automatically and in that case all keys are re-hashed.

This is the shortest description of this kind of structure.

How dict or hash table are used for DDOS

This is used for every web server and script language. So you send some request, web server it cached with the hash table (because of each request is cached if does not exist - if request exist, it just pull from memory and serve browser with no PHP /Go/Python/Java execution)

But what if the attacker knows how to make collision inside of web server and full 1 million or maximum possible queue on that web server?

So he sends a request, if not cached web server cached. Each time when it checks it grows N+1. So it needs to make N checks. But if a collision happens it goes to check trough list. Each time key/value is added to place into the list. So this makes O(N^2) to insert key/value in the hash table. This starts slowly to slow down fast O(1) and use CPU more and more.

The first time is discovered in a 2003 paper Denial of Service via Algorithmic Complexity Attacks

After adding some random bytes to keep an attacker away from this (each time hash table has unique hashing value on each server) in 2011 they find a way to bypass. Youtube link from 28th Chaos Communication Congress 28c3: Effective Denial of Service attacks against web application platforms

In response to this - they add "secret randomized seed value" to keep attack away.

But in 2012 ( congress 29c3) on they use differential cryptanalysis to crack this random seed and to use for the attack.

Later they suggest using SipHash as a good combination of speed and security. So it resists to a DDOS attack on hash table / dict. All programming languages and web services using SipHash to prevent the attack. Url info on the attack: Hash table attack