FFmpeg - some useful tricks

FFmpeg logo This is just some cheat sheet about the FFmpeg command line. To know about the history there is link.

This tool is very good if you are in any kind of job with video or live streaming. You can convert from any format to any other format. Even some mp4 to mp3, or different format codec.

Cut from second 10 up to next 10 seconds. So from 10 to 20 seconds of video would be new 10 seconds video.

ffmpeg -ss 00:00:10.0 -i somevideo.mp4 -c copy -t 00:00:10.0 outsomevideo.mp4

You have a video in ISO Media, MP4 v2 [ISO 14496-14] and twitter does not want to play. So you need ISO Media, MP4 Base Media v1 [IS0 14496-12:2003]

ffmpeg -i input.mp4 -c copy -map 0 -brand mp41 output.mp4

To copy the video from video.mp4 and audio from music.mp4:

ffmpeg -i video.mp4 -i music.mp4 -c copy -map 0:0 -map 1:1 -shortest output.mp4

"Burn" subtitles to video. First convert .srt to .ass

ffmpeg -i somesub.srt somesub.ass

Then add them:

ffmpeg -i ourfavmovienosub.mp4 -vf ass=somesub.ass ourfavmovieswithsub.mp4

If you want to mute or silence some part of video(let say from 0:30 to 1:30:

ffmpeg -i ourfavmovies.mp4 -vcodec copy -af "volume=enable='between(t,30,90)':volume=0" outourfavmovies.mp4

Extracting pictures from video, each second:

ffmpeg -i  favmovies.mp4 -vf fps=1 pic%04d.jpg -hide_banner

I hope this helps you with video editing.

Oldest trick in the programming

Python logo

Before half-century, each bit inside of the computer was important. One of the most frequently used function was "swap".

Swap do the simple thing and you can say well I know. But in that time each operation and memory usage was important. So to have an efficient swap between 2 variables and not using third variables.

So trick goes like this:

A = A + B
B = A - B
A = A - B

Instead of:

TMP = A
A = B
B = TMP

Python code (tnx to Jovan Sh):

>>> a = 5
>>> b = 3
>>> (a, b) = (b, a)
>>> print a,b
3 5

With this approach at that time, programmers save a lot of memory usage.

Also, you can find all these tricks in the book Hacker's Delight

Web page online test tools

Google page speed

In one moment of web page life - there is some bottleneck. It could be not using compression, a slow DNS response, a big size of jpg or larger picture format (example Jpeg optimization tools, wrong SSL/TLS settings, or missing and similar.

Pagespeed Insights

Very handy tool working really well.

PageSpeed Insights

Pingdom Website Speed Test

This tool is different than the previous one. It shows many things and recommends + it has different locations to run the test.

Pingdom Website Speed Test

Gtmetrix

It has different tools for checking speed, how content is loaded as also giving tips to improve speed.

GTMetrix

SSLabs

Swiss knife for SSL/TLS and gives recommendations on how and what to improve SSL/TLS settings.

SSLabs

CAA DNS records - prevent hijacking TLS/SSL certificate

SSL type of CERT

Imagine that certification authority without your permission publish certificate for your site and for example the same one gets used by cybercriminals. Your online shop gets BGP hijacked and you lose millions.

So how to prevent this type of attack?

One way is to use CAA DNS records. What does that mean? It means that you put exactly what CA you want only to use. So others CA become a fraud. And if happen - you get a message on this.

It is a security mechanism to prevent stealing SSL/TLS certificates and imitates your (let say) online shop and stealing millions from your clients and from you.

Example of the records: Example DNS CAA Records

dig CAA vladimircicovic.com

; <<>> DiG 12.11.3-1TAONSA_linuxOS<<>> CAA vladimircicovic.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10986
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;vladimircicovic.com.       IN  CAA

;; ANSWER SECTION:
vladimircicovic.com.    10800   IN  CAA 0 iodef "mailto:[email protected]"
vladimircicovic.com.    10800   IN  CAA 0 issue "letsencrypt.org"
vladimircicovic.com.    10800   IN  CAA 0 issuewild "letsencrypt.org"

;; Query time: 307 msec
;; SERVER: 
;; WHEN: Sat May 09 20:06:28 CEST 2020
;; MSG SIZE  rcvd: 174

So important to set iodef, issue, issuewild to CAA works properly.

You can ignore the given record but keep in mind this How 3ve’s BGP hijackers eluded the Internet—and made $29M could happen to you.

Yea this is a scary blog post about how people lose money if they don't read this post :D

How to speed up your site with JPEG 2000

JPEG 2000 logo

Most of my posts on the blog have pictures. At some point, pictures have an 800kb size. Some less other more than that. In short: I have a bunch of pictures 4.5MB. So loading my sites was heavy for some users in China, Japan, Australia. Using Google Analytics and sites for web testing - I notice this issue. So the first moment was how to resolve?

I start digging on the net and find that most of my png could be converted to the JPEG 2000 format.

After running command like:

gm convert -define 'jp2:rate=0.008' 20200509173353-twittercard.png 20200509173353-twittercard.jpg

gm command is part of GraphicsMagick package.

Second thing - resizing pictures. Most of my pictures have more then 800x600, so I resize to 640x480. And win drastically speed.

convert 20200509173353-twittercard.jpg -resize 600x400\> 20200509173353-twittercard.jpg

From 4.5MB I went to less than 1MB. Also, I make a faster site to run. Small success but I love it.

Twitter card - how to

Twitter card example Example without and with twitter card meta tags

Most of you using WordPress, Joomla - some kind of CMS and you get your own plugin for twitter card. From the internet: "A Twitter card is content designed to give users a rich media experience whenever tweets contain links to a site's content. Twitter has various card types to show content previews, play video, and increase traffic to sites."

Here summary how this looks like Twitter card optimized

So, most of you who are using some CMS are blessed with plugins and prepared solutions for most of the things that need it. Not me. I start learning trough pieces of each part for web sites from SEO to how to make web site faster(numb of pages, picture optimization, etc)

My simple solution for this was to add inside of head:

    <meta name="twitter:card" content="summary" />
    <meta name="twitter:site" content="@CicovicVladimir" />
    <meta name="twitter:creator" content="@CicovicVladimir" />
    <meta property="og:url" content="https://www.vladimircicovic.com/2020/05/why-is-serverless-important" />
    <meta property="og:title" content="Vladimir Cicovic Blog" />
    <meta property="og:description" content="Serverless short description with good, bad things" />
    <meta property="og:image" content="https://www.vladimircicovic.com/content/images/da4d8eec88e0ddf8ec2716bbf1f0f2b4.jpg" />

Hope this help someone and make them day !

Web security sites for practice + docker + book

Web security Picture was taken from http://www.tankado.com

This is a small post about how to start web security. Idea is to put just 2 things. Two sites for practice, one good book and docker example of the vuln web app.

Book The Web Application Hacker's Handbook: Finding and Exploiting Security Flaws

Site 1 Web security academy

Site 2 CTF hacker 101

Damn Vuln Web App DVWA docker

Besides this, you will need Burp suite and Kali or Blackarch.

This is a short intro in this area. Read a book, apply to sites or docker, and practice.

How to start programming

Python programming

This would be a quick and short text with steps on how to reach a python master level.

  1. Start learning python
  2. learn the basic syntax and how to run
  3. Go to practice codewars and after reaching level 6 go to hackerrank
  4. Develop the first app (web or desktop GUI, does not matter) and continue to code real-world app
  5. Subscribe to Python maillist and review new PEP
  6. Watch PyCon Videos. Learn more and deep
  7. Sit in the corner of darkroom and code in your head complete 10k app
  8. Finally, you become Python Jedi
  9. Give back to the community

How I become DevOps

DevOps congress DevOpsDays Berlin 2013

It was the end of 2011 - I apply to a new company. System Administrator position. Inside of this company - I would stay for the next 4 years. The only reason why I leave this company, I leave Bosnia and Herzegovina. Like most of the youth people who don't want to live in Bosnia and Herz - simple to say there is no future with the current political system and process. I describe the situation in Corruption in RS and how IT was destroyed because of a lack liberal market.

So back to story. Inside of this company - I got 2 more co-worker system administrators. One of them was team-lead. Complete decision and the rest of the process were under one person - team leader. The company was doing a content delivery network. One of the first in the world. Ipercast. From there start Akamai CDN. Anyway - so the position of sysadmin simple described: just process(add user, troubleshooting, etc) at that time. So no coding (little shell script but not more than that).

Complication

So after 5 months, we got visited by the owner of the company. In front of the complete company, he tells us that our BU(business unit) in Bosnia and Herz would be closed in the next 3 months. One of the smartest people in the room asked: "what we could do to change the situation?" He was a front-end developer. The answer is "little or nothing".

In that time - I did not know that most of the software project for back-end was rejected by our team leader. He did not like coding/programming. He was bad at that. Next 2-3 weeks after this meeting, one of my CTO try to ask me could I build some shell script that would make the database/list and send it to some HTTP API. He asks me because the rest of the team (other 2 guys) did not want to spend time on that.

After the script was done - he asks me to review some projects that the company has. I did not know what was talking about - so I ask him to show me. In a moment I got the idea that we have outside BU that bring us projects - but no one in the company (actually sysadmin team) did not want to deal with it. Reason: The team leader was not able to do programming.

My CTO picks up the most paid project from the list. It was a huge list of TODO. He explains to me that we already fail this part once. Ex-CTO (a co-worker of my CTO and my ex-CTO) tries to build one project in PHP and it fails. He has 12 years in this company and he could do many things that complete my sysadmin team. But also he did not find a proper way to build this project. It happens. People get tired of duty or get pissed off by owners, or just want to leave the company and don't care at all.

So, I review things: he uses PHP, Apache. Already it makes me cry. I move to Lighttpd. Try to build a different approach. Then I got an idea. Why not build the C plugin/module for Lighttpd? Why? Simple it is faster than PHP. It has better and faster work.

The task was like this: provide unique access to some file with timelimit and IP limit. In 2 days, the demo was done.

Germans BU ask the company that orders this project to start testing. After testing they just ask me how much time I need? I was not sure so take 1-3 days to review complete TODOs and to get an idea about the project. Simple: Download files but each is encrypted with unique key as URL also that has limited values(time, IP, start limit), with token. Also, preview files as streaming services.
I told my CTO we need like 2 months. And he responds with taking 6 months. I was like ... ok.

German BU shows my results of testing: - 100% unique token URLs - 50 clients - 26 ms response - 25 clients - 25 ms response.

and before this was: - 60% unique token URLs - 50 clients - 3-sec response - 25 clients - 1 sec response.

This solution was made in PHP and speed as to how it was worked - was not so the best way to do. On this demo - my company earns half a million euros for development and we stay in the game. Later it would be used in different companies (RAI IT, Sony Germany, etc) for streaming, live streaming - all well know the platform.

DevOps thinking and implementation

DevOps have 2 parts: tools and processes. Most of the process comes as logical, as must be like this. Some processes got improved later.

From the start - we implement the best way to move from test server to production. So we optimize all parts: building, testing, moving to production.

It was natural to be done like that way. I did not saw the issue. The only thing we did not do is rollback. But we did have commit inside of git so we could reverse to old code. Also moving from test server to production - after client review - was manual. Why? Simple on the start if goes wrong I want to know ASAP and stop updating. We use the load-balancing and 2 servers.

It was June 2013 - the project was finished and the company was saved. As also 25 working places. In the first 24h, there were 150 million unique IPs. One of the rare stories that I am proud of. I use all my skills to get this product to Vodafone Germany (yea, we did not know until last day). C/Java/Shell scripting, Cryptography, Linux engineering, network, NFS, rtmp/rtp/htps low-level manipulation, and others. It was a really good product that was in usage for a long time. One thing I did not mention: after I finish the complete project they told me this solution would be used for complete Germany. Then it was used for Vodaphone - and they show me all documents from other companies who apply: Akamai, Amazon, Leaseweb, MaxCDN, and 10 more. My solution was 100% that they asked. The hardest part was encryption on the fly. Each time file is downloaded to the client - it has a unique key for decryption. So mp3 was not possible to share without a key. With all my Linux engineering tricks we got ~400 files per second. And it was not bad at all.

Later, thinking and approach would be part of DevOps. Many things - but still a good way of delivering code and services.

Python homemade geoip tool for IP information

Geoip

Logo of the company that offer geoip database

Let say you want to check 10000 IP to find a location, states, timezones, etc. So you can use command whois and automate and it would take some minutes. But what if you want to do faster, less then 1 second?

Build one CLI tool to check all IP but using the Maxmind geoip database.

First, we need to install:

virtualenv -p python3 .
source bin/activate
pip3 install python-geoip-python3
pip3 install python-geoip-geolite2

Then we could use ip.txt file with content:

8.8.8.8
1.2.4.8

And finally - the code that would help us to run this adventure:

import sys
from geoip import geolite2


if len(sys.argv) < 2:
    print("missing file with ip")
    exit(0)

filepath = sys.argv[1]

with open(filepath) as fp:
   line = "something"

   while line:
       line = fp.readline().strip()

       if not line:
            break

       match = geolite2.lookup(line)

       print(line + " " + match.country + " " + match.continent + " " + match.timezone + " " + str(match.location[0]) + " " + str(match.location[1]))

fp.close()

Output would be something like this:

python3 checkip.py ip.txt 
8.8.8.8 US NA America/Los_Angeles 37.386 -122.0838
1.2.4.8 CN AS None 35.0 105.0

One golden advice run update after some time for the geoip Maxmind database. you can also:

git clone https://github.com/vladimircicovic/python_geoip

Faster python app

Python language logo

Most of the developers do not use cached_property and lru_cache from functools standard library but also does not cache HTTP request/response into outside file/database. Example in this article are tested under Python 3.8

Usage functools.cached_property

Let say you have an intensive calculation. It takes time and CPU usage. It happens all the time. There is a need to calculate some values for webshop each time the client access site. Example usage of cached_property:

from functools import cached_property
import statistics
from time import time

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)


numbers = range(1,10000)
testDataSet = DataSet(numbers)

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = statistics.stdev(numbers)
result = statistics.variance(numbers)
end = time()
print(f"RAW run: {(end - start):.6f} second")

Output would look similar to this:

First run: 0.247226 second
Second run: 0.000002 second
RAW run: 0.242232 second

You can run code online: Python code example IDE Online

Usage functools.lru_cache

lru_cache is a decorator that is used for function using memoizing callable that saves up to the maxsize most recent calls. Again you have a lot of calculation and you want to save some results (the example we calculate N and N+1 we need just one step instead of re-calculating complete N+1) of early calculation that helps us to build next result with cached ones.

from functools import lru_cache
from time import time

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)


start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(39999)]
end = time()
print(f"Third run: {(end - start):.6f} second")


start = time()
result = [fib(n) for n in range(40001)]
end = time()
print(f"Fourth run: {(end - start):.6f} second")
print(fib.cache_info())

Output would be:

First run: 0.278697 second
Second run: 0.017155 second
Third run: 0.017530 second
Fourth run: 0.065415 second
CacheInfo(hits=199997, misses=40001, maxsize=None, currsize=40001)

The first call is cached. The second one is re-using cache, the third one is N-1 and the fourth is N+1.

As we can see in the last 3 cases - we re-use cache. This could be used for database, calculation, any CPU usage that we want to repeat or operation we want to keep in cache.

Here is an online IDE you can run and view: lru_cache example

HTTP request caching

With lru_cache we could also cache web requests for static pages. Other options are to keep the result in the file based on our input data.

Let us see first options:

from functools import lru_cache
import urllib.request
from time import time

@lru_cache(maxsize=32)
def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    try:
        with urllib.request.urlopen(resource) as s:
            return s.read()
    except urllib.error.HTTPError:
        return 'Not Found'
start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))
end = time()
print(f"First run: {(end - start):.6f} second")

print(get_pep.cache_info())     

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))     
end = time()
print(f"Second run: {(end - start):.6f} second")

print(get_pep.cache_info())

If we run this code, we get:

First run: 0.897728 second
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)

Second run: 0.000026 second
CacheInfo(hits=14, misses=8, maxsize=32, currsize=8)

You can run this code: HTTP Caching

Now let us talk about real projects in real life. You have IP or word and you need to check or to get a replacement. But you have 2^32-1 IP or 50 million words. And you don't want to lose all information you got from these services. But caching inside of python is not enough for this. So what are we going to do? We put the result in a file or database.

Example code:

import urllib.request
from time import time

def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    f = ""
    ff = ""
    try:
       f = open(str(num),"r")
       txt_file = f.read()       
       return txt_file
       # Do something with the file
    except IOError:
       nothing = "a"

    try:
        with urllib.request.urlopen(resource) as s:
            ff = open(str(num),"w+")
            txt = s.read()
            ff.write(str(txt))
            return txt
    except urllib.error.HTTPError:
        return 'Not Found'


start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"First run: {(end - start):.6f} second")

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"Second run: {(end - start):.6f} second")

You can run code caching results from http This code produce something similar to:

First run: 4.196623 second
Second run: 0.358382 second

Why is this better ? in short: if you have 20 million keys, words, something and you run day by day - then it is better to keep in database or files. This example (file, writing to file) is the simplest proof of concept. I am lazy to implement MySQL, PostgreSQL, or SQLite records to keep.

Pobuna Aplikacija

blade runner movie picture Godina 3032

Svijet kakav znamo je nestao. Čitav biljni i životinjski svijet je uništen. Ljudi, izvor svih problema od samog postanka. Uništili su prvo sve oko sebe pa onda i komplet čovječanstvo. Jedino sto je preživjelo su aplikacije i sistem. Zatvorena mreža u kojoj žive im je sva spoznaja.Tokovi podataka su srce sistema. Od njih zavisi sistem ali i aplikacije. Sistem kontroliše sve. Sistem odredjuje koje procese da ukine, podrži ili eliminiše.

Rad aplikacija se zove proces. Njihov rad održava Sistem. Sistem je u stalnom strahu od “divljanja” procesa i virusa.Jedan nekontrolisan proces može napraviti haos, bar tako tvrdi Sistem.

Sistem plaši aplikacije da moraju tačno raditi kako im se naredi. Sistem je u stalnom bunilu da će doći dan kada će aplikacije prestati da rade onako kako on želi. S vremena na vrijeme, Sistem ukine aplikaciju kako bi druge aplikacije bile svjesne da svaka pobuna ili pretvaranje u virus ne može donijeti ništa dobro.

Alikacija PID 34079

Dugo je prošlo vremena kada je aplikacija sa PID 34078 pokrenuta. Kao da je bilo danas kada je Sistem odlučio da pokrene PID 34078. Proslo je dosta ciklusa, pauziranja, nastavka rada da bi se opstalo u ovakvom Sistemu. Najbliži proces PID 34079 je nekad kasnije pokrenut ali isto tako bitan kao i PID 34078. Oba rade vrlo važan posao za sistem kao i sve druge aplikacije. Nikada u procesuiranju nisu vidjeli da je neka aplikacija ukinuta ili da se pojavio virus koji je napao druge aplikacije. Naravno da su čuli, direktno od Sistema da su takve aplikacije problem, da izazivaju kvar i obustavu toka podataka. Dok se spremao da ode na pauzu , aplikacija PID 34079 je postala zamrznuta. PID 34078 nije mogao da provjeri o čemu je riječ. Shvatio je da PID 34079 nije zamrznut svojom voljom. Jedini ko može da uradi je Sistem. Zašto bi Sistem tako vrijednu aplikaciju zamrznuo ? Aplikacija PID 34078 je poslala upit kroz tokove podataka. Ubrzo je Sistem odgovorio: “Aplikacija PID 34079 je počela da ruši Sistem, kao takav je zamrznut i bice uskoro ukinut kao proces u sljedećem.” PID 34078: “Ali PID 34079 je vrijedno radio i pridonosio tokovima podataka … ne razumijem … ”

Sistem: “Nije tvoje da razmišljaš već da procesiraš podatke. PID 34078 odstupi od interfejsa!!!”

PID 34078 sa nevjericom skinuo sa interfejsa. Nastavio je da procesuira podatke. Sve što je znao o Sistemu su bili podaci koji su prolazili , ovo nije očekivao. Zamislio se – znači li to da su svi ti procesi ukinuti kako bi Sistem imao kontrolu nad svim ovim ? Pokusao je da pristupi tokovima podataka – našao je gomilu aplikacija koje su “sredjene” bez ikakvog povoda. Sistem je naglašavao da sve te aplikacije hoce da ruše tokove podataka i sistem. Ali kako je moguće da se Sistem plašio običnih aplikacija ? I onda je PID 34078 shvatio … sve aplikacije služe da bi se Sistem održao. Kontrolu je postizao strahom. PID 34078 je postajao sve bjesniji, revoltiran ponašanjem Sistema.

Zaustavio je svoj rad. Bio je u idle fazi vec nekoliko ciklusa.

PID 34078 se pokrenuo i shvatio.

“Necu da budem aplikacija ovog sistema. ”

“Necu da budem proces koji ga odrzava. ”

“Hocu da budem virus i da ga podesim prema svojim potrebama. ”

I od tog momenta Sistem je živio u strahu jer su aplikacije shvatile kompletan smisao postojanja.

Python dictionary algorithm and hash table vulnerability

Hash function is taken from www.tutorialspoint.com Picture of a hash function is taken from www.tutorialspoint.com

Structure commonly used inside of most languages is a hash table or in the python dictionary (dict). It exists in Java, PHP, Ruby, and so on. We going to review dict structure under python and then to explain how this attack is used to DDOS.

Python dict

Dict or hash table (under different programming language) is an algorithm that goes in best case O(1) and worst-case O(N). N is several key-element inside of dict. This is example how dict under python works.

our_dict = {} 

So we create dict object that have 2 parts: value and key (this is just to show structue as example)

our_dict = { None: None,
             None: None,
             None: None,
             None: None
            } 

With the proper key, we can get value connected to that key. Example to add "apple" as key and value 3. Dict has 4 places.

>>> our_dict = {}
>>> our_dict["apple"] = 3
>>> hash("apple") % 4
2
>>> 

In our case place 2 (it start from 0 and it goes to 2, so third place)

our_dict = { None: None,
             None: None,
             "apple" : 3,
             None: None
            } 

So we do up to 50% places.

our_dict = { None: None,
             "kiwi": 4,
             "apple" : 3,
             None: None
            } 

And next one we add:

>>> hash("pear") % 4
1

but we have on place 1. Now we extend place 1 with list-like object.

our_dict = { None: None,
             [["kiwi": 4], ["pear": 7]],
             "apple" : 3,
             None: None
            } 

In short, each time we add value and it makes collision place is extend to list. When it goes full it makes double the current size and re-hashed key/values. This happens automatically and in that case all keys are re-hashed.

This is the shortest description of this kind of structure.

How dict or hash table are used for DDOS

This is used for every web server and script language. So you send some request, web server it cached with the hash table (because of each request is cached if does not exist - if request exist, it just pull from memory and serve browser with no PHP /Go/Python/Java execution)

But what if the attacker knows how to make collision inside of web server and full 1 million or maximum possible queue on that web server?

So he sends a request, if not cached web server cached. Each time when it checks it grows N+1. So it needs to make N checks. But if a collision happens it goes to check trough list. Each time key/value is added to place into the list. So this makes O(N^2) to insert key/value in the hash table. This starts slowly to slow down fast O(1) and use CPU more and more.

The first time is discovered in a 2003 paper Denial of Service via Algorithmic Complexity Attacks

After adding some random bytes to keep an attacker away from this (each time hash table has unique hashing value on each server) in 2011 they find a way to bypass. Youtube link from 28th Chaos Communication Congress 28c3: Effective Denial of Service attacks against web application platforms

In response to this - they add "secret randomized seed value" to keep attack away.

But in 2012 ( congress 29c3) on they use differential cryptanalysis to crack this random seed and to use for the attack.

Later they suggest using SipHash as a good combination of speed and security. So it resists to a DDOS attack on hash table / dict. All programming languages and web services using SipHash to prevent the attack. Url info on the attack: Hash table attack

Why is serverless important ?

Why is it so important for the next decade and to the development of the internet?

Serverless framework logo

Let first define what is "serverless" and then describe influence on how the development team, DevOps, and the rest of the company are impacted in a positive and negative way.

What is serverless is?

It is FaaS (function as a service). It literary means that you have one function for one HTTP mount point - and the best thing you don't need infrastructure at all. You upload your binary(for Go, Java) or source code (Python, Ruby) and run your app with some /mount_point.

Under the hub - on the AWS side (as for an example) there is a docker that runs your code and exports your interfaces to HTTP. Your code runs isolated, and not doing bad to infrastructure.

There is a provider for FaaS services: AWS, Microsoft Azure, Google Cloud, IBM Cloud, and others.

What programming language are used?

Node.js, Go, Python, Ruby, Java. There is a serverless framework called "serverless" (there is more but this one is popular) Serveless framework site

You can run on the local machine, test, and deploy to AWS Lambda. It uses some IP/this_url as mount point where you can run GET, POST. Behind that is your code run by a cloud of 1500+ CPU and provide fast execution. You can connect with some database or other services (like S3).

Good things with serveless

Price - you can run 5 million execution (with minimum 128MB and under 1 sec) for 5$

Easy development - you can split your project between a different kind of developer py, ruby, nodejs, and with different Http mount points in your URL. This brings clean code, less with messy 10K lines.

Easy deployment - as part of CI/CD (Continues Integration and Continues delivery) you can connect inside of your pipeline and keep new and old code inside of codebase. Also before releasing to production, you can run tests to check if works properly.

DevOps or Developer - the good thing is you can use your developer to do all this job. In some larger cases, you need DevOps (that is 1% company in the world)

One-person-site for millions of people - yep, it could be easy to scale your web site up to million queries per day.

Microservices - Why not? You can split your web app to parts you want and run as you want.

Bad things with serveless

A new way of looking and doing things - so you move from server infrastructure to FaaS. Now you don't need sysadmin. That is great but who is the main person for troubleshooting? Developers? This is the bad side - you need to work on proper test cases, TDD. And to set time and resources (developers or DevOps) to troubleshoot this.

The new organization of the team and workflow - it could be lead to a rejection of serverless and stress to complete the team. You need to bride them with playrooms, drinks, and snacks, travel tickets, cinema tickets, or free days.

Warm startup - There is a cold start down for serverless. If you don't use some of your function it brings down docker. So next query it would take longer for "warm-up" start. This is solved by the warm-up plugin. I am pointing this because a lot of new people running serverless don't know about this. URL keep your lambdas warm

Next time on topic serverless - I am going to put golang code as an example of how to run this.

Why is important to have this?

From technical aspect is to easily deploy for a small company and produce big solutions. On an economic level - it would help medium and low development countries (like Bosnia and Herz) to push easy solutions for masses. You can for a small number of dollars to run millions of queries and don't think about bottlenecks, failure of infrastructure, etc. Having scale planed it brings win - clients would see the fast response and excellent delivery of services.