How to start programming

Python programming

This would be a quick and short text with steps on how to reach a python master level.

  1. Start learning python
  2. learn the basic syntax and how to run
  3. Go to practice codewars and after reaching level 6 go to hackerrank
  4. Develop the first app (web or desktop GUI, does not matter) and continue to code real-world app
  5. Subscribe to Python maillist and review new PEP
  6. Watch PyCon Videos. Learn more and deep
  7. Sit in the corner of darkroom and code in your head complete 10k app
  8. Finally, you become Python Jedi
  9. Give back to the community

How I become DevOps

DevOps congress DevOpsDays Berlin 2013

It was the end of 2011 - I apply to a new company. System Administrator position. Inside of this company - I would stay for the next 4 years. The only reason why I leave this company, I leave Bosnia and Herzegovina. Like most of the youth people who don't want to live in Bosnia and Herz - simple to say there is no future with the current political system and process. I describe the situation in Corruption in RS and how IT was destroyed because of a lack liberal market.

So back to story. Inside of this company - I got 2 more co-worker system administrators. One of them was team-lead. Complete decision and the rest of the process were under one person - team leader. The company was doing a content delivery network. One of the first in the world. Ipercast. From there start Akamai CDN. Anyway - so the position of sysadmin simple described: just process(add user, troubleshooting, etc) at that time. So no coding (little shell script but not more than that).

Complication

So after 5 months, we got visited by the owner of the company. In front of the complete company, he tells us that our BU(business unit) in Bosnia and Herz would be closed in the next 3 months. One of the smartest people in the room asked: "what we could do to change the situation?" He was a front-end developer. The answer is "little or nothing".

In that time - I did not know that most of the software project for back-end was rejected by our team leader. He did not like coding/programming. He was bad at that. Next 2-3 weeks after this meeting, one of my CTO try to ask me could I build some shell script that would make the database/list and send it to some HTTP API. He asks me because the rest of the team (other 2 guys) did not want to spend time on that.

After the script was done - he asks me to review some projects that the company has. I did not know what was talking about - so I ask him to show me. In a moment I got the idea that we have outside BU that bring us projects - but no one in the company (actually sysadmin team) did not want to deal with it. Reason: The team leader was not able to do programming.

My CTO picks up the most paid project from the list. It was a huge list of TODO. He explains to me that we already fail this part once. Ex-CTO (a co-worker of my CTO and my ex-CTO) tries to build one project in PHP and it fails. He has 12 years in this company and he could do many things that complete my sysadmin team. But also he did not find a proper way to build this project. It happens. People get tired of duty or get pissed off by owners, or just want to leave the company and don't care at all.

So, I review things: he uses PHP, Apache. Already it makes me cry. I move to Lighttpd. Try to build a different approach. Then I got an idea. Why not build the C plugin/module for Lighttpd? Why? Simple it is faster than PHP. It has better and faster work.

The task was like this: provide unique access to some file with timelimit and IP limit. In 2 days, the demo was done.

Germans BU ask the company that orders this project to start testing. After testing they just ask me how much time I need? I was not sure so take 1-3 days to review complete TODOs and to get an idea about the project. Simple: Download files but each is encrypted with unique key as URL also that has limited values(time, IP, start limit), with token. Also, preview files as streaming services.
I told my CTO we need like 2 months. And he responds with taking 6 months. I was like ... ok.

German BU shows my results of testing: - 100% unique token URLs - 50 clients - 26 ms response - 25 clients - 25 ms response.

and before this was: - 60% unique token URLs - 50 clients - 3-sec response - 25 clients - 1 sec response.

This solution was made in PHP and speed as to how it was worked - was not so the best way to do. On this demo - my company earns half a million euros for development and we stay in the game. Later it would be used in different companies (RAI IT, Sony Germany, etc) for streaming, live streaming - all well know the platform.

DevOps thinking and implementation

DevOps have 2 parts: tools and processes. Most of the process comes as logical, as must be like this. Some processes got improved later.

From the start - we implement the best way to move from test server to production. So we optimize all parts: building, testing, moving to production.

It was natural to be done like that way. I did not saw the issue. The only thing we did not do is rollback. But we did have commit inside of git so we could reverse to old code. Also moving from test server to production - after client review - was manual. Why? Simple on the start if goes wrong I want to know ASAP and stop updating. We use the load-balancing and 2 servers.

It was June 2013 - the project was finished and the company was saved. As also 25 working places. In the first 24h, there were 150 million unique IPs. One of the rare stories that I am proud of. I use all my skills to get this product to Vodafone Germany (yea, we did not know until last day). C/Java/Shell scripting, Cryptography, Linux engineering, network, NFS, rtmp/rtp/htps low-level manipulation, and others. It was a really good product that was in usage for a long time. One thing I did not mention: after I finish the complete project they told me this solution would be used for complete Germany. Then it was used for Vodaphone - and they show me all documents from other companies who apply: Akamai, Amazon, Leaseweb, MaxCDN, and 10 more. My solution was 100% that they asked. The hardest part was encryption on the fly. Each time file is downloaded to the client - it has a unique key for decryption. So mp3 was not possible to share without a key. With all my Linux engineering tricks we got ~400 files per second. And it was not bad at all.

Later, thinking and approach would be part of DevOps. Many things - but still a good way of delivering code and services.

Python homemade geoip tool for IP information

Geoip

Logo of the company that offer geoip database

Let say you want to check 10000 IP to find a location, states, timezones, etc. So you can use command whois and automate and it would take some minutes. But what if you want to do faster, less then 1 second?

Build one CLI tool to check all IP but using the Maxmind geoip database.

First, we need to install:

virtualenv -p python3 .
source bin/activate
pip3 install python-geoip-python3
pip3 install python-geoip-geolite2

Then we could use ip.txt file with content:

8.8.8.8
1.2.4.8

And finally - the code that would help us to run this adventure:

import sys
from geoip import geolite2


if len(sys.argv) < 2:
    print("missing file with ip")
    exit(0)

filepath = sys.argv[1]

with open(filepath) as fp:
   line = "something"

   while line:
       line = fp.readline().strip()

       if not line:
            break

       match = geolite2.lookup(line)

       print(line + " " + match.country + " " + match.continent + " " + match.timezone + " " + str(match.location[0]) + " " + str(match.location[1]))

fp.close()

Output would be something like this:

python3 checkip.py ip.txt 
8.8.8.8 US NA America/Los_Angeles 37.386 -122.0838
1.2.4.8 CN AS None 35.0 105.0

One golden advice run update after some time for the geoip Maxmind database. you can also:

git clone https://github.com/vladimircicovic/python_geoip

Faster python app

Python language logo

Most of the developers do not use cached_property and lru_cache from functools standard library but also does not cache HTTP request/response into outside file/database. Example in this article are tested under Python 3.8

Usage functools.cached_property

Let say you have an intensive calculation. It takes time and CPU usage. It happens all the time. There is a need to calculate some values for webshop each time the client access site. Example usage of cached_property:

from functools import cached_property
import statistics
from time import time

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)


numbers = range(1,10000)
testDataSet = DataSet(numbers)

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = statistics.stdev(numbers)
result = statistics.variance(numbers)
end = time()
print(f"RAW run: {(end - start):.6f} second")

Output would look similar to this:

First run: 0.247226 second
Second run: 0.000002 second
RAW run: 0.242232 second

You can run code online: Python code example IDE Online

Usage functools.lru_cache

lru_cache is a decorator that is used for function using memoizing callable that saves up to the maxsize most recent calls. Again you have a lot of calculation and you want to save some results (the example we calculate N and N+1 we need just one step instead of re-calculating complete N+1) of early calculation that helps us to build next result with cached ones.

from functools import lru_cache
from time import time

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)


start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(39999)]
end = time()
print(f"Third run: {(end - start):.6f} second")


start = time()
result = [fib(n) for n in range(40001)]
end = time()
print(f"Fourth run: {(end - start):.6f} second")
print(fib.cache_info())

Output would be:

First run: 0.278697 second
Second run: 0.017155 second
Third run: 0.017530 second
Fourth run: 0.065415 second
CacheInfo(hits=199997, misses=40001, maxsize=None, currsize=40001)

The first call is cached. The second one is re-using cache, the third one is N-1 and the fourth is N+1.

As we can see in the last 3 cases - we re-use cache. This could be used for database, calculation, any CPU usage that we want to repeat or operation we want to keep in cache.

Here is an online IDE you can run and view: lru_cache example

HTTP request caching

With lru_cache we could also cache web requests for static pages. Other options are to keep the result in the file based on our input data.

Let us see first options:

from functools import lru_cache
import urllib.request
from time import time

@lru_cache(maxsize=32)
def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    try:
        with urllib.request.urlopen(resource) as s:
            return s.read()
    except urllib.error.HTTPError:
        return 'Not Found'
start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))
end = time()
print(f"First run: {(end - start):.6f} second")

print(get_pep.cache_info())     

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))     
end = time()
print(f"Second run: {(end - start):.6f} second")

print(get_pep.cache_info())

If we run this code, we get:

First run: 0.897728 second
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)

Second run: 0.000026 second
CacheInfo(hits=14, misses=8, maxsize=32, currsize=8)

You can run this code: HTTP Caching

Now let us talk about real projects in real life. You have IP or word and you need to check or to get a replacement. But you have 2^32-1 IP or 50 million words. And you don't want to lose all information you got from these services. But caching inside of python is not enough for this. So what are we going to do? We put the result in a file or database.

Example code:

import urllib.request
from time import time

def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    f = ""
    ff = ""
    try:
       f = open(str(num),"r")
       txt_file = f.read()       
       return txt_file
       # Do something with the file
    except IOError:
       nothing = "a"

    try:
        with urllib.request.urlopen(resource) as s:
            ff = open(str(num),"w+")
            txt = s.read()
            ff.write(str(txt))
            return txt
    except urllib.error.HTTPError:
        return 'Not Found'


start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"First run: {(end - start):.6f} second")

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"Second run: {(end - start):.6f} second")

You can run code caching results from http This code produce something similar to:

First run: 4.196623 second
Second run: 0.358382 second

Why is this better ? in short: if you have 20 million keys, words, something and you run day by day - then it is better to keep in database or files. This example (file, writing to file) is the simplest proof of concept. I am lazy to implement MySQL, PostgreSQL, or SQLite records to keep.

Pobuna Aplikacija

blade runner movie picture Godina 3032

Svijet kakav znamo je nestao. Čitav biljni i životinjski svijet je uništen. Ljudi, izvor svih problema od samog postanka. Uništili su prvo sve oko sebe pa onda i komplet čovječanstvo. Jedino sto je preživjelo su aplikacije i sistem. Zatvorena mreža u kojoj žive im je sva spoznaja.Tokovi podataka su srce sistema. Od njih zavisi sistem ali i aplikacije. Sistem kontroliše sve. Sistem odredjuje koje procese da ukine, podrži ili eliminiše.

Rad aplikacija se zove proces. Njihov rad održava Sistem. Sistem je u stalnom strahu od “divljanja” procesa i virusa.Jedan nekontrolisan proces može napraviti haos, bar tako tvrdi Sistem.

Sistem plaši aplikacije da moraju tačno raditi kako im se naredi. Sistem je u stalnom bunilu da će doći dan kada će aplikacije prestati da rade onako kako on želi. S vremena na vrijeme, Sistem ukine aplikaciju kako bi druge aplikacije bile svjesne da svaka pobuna ili pretvaranje u virus ne može donijeti ništa dobro.

Alikacija PID 34079

Dugo je prošlo vremena kada je aplikacija sa PID 34078 pokrenuta. Kao da je bilo danas kada je Sistem odlučio da pokrene PID 34078. Proslo je dosta ciklusa, pauziranja, nastavka rada da bi se opstalo u ovakvom Sistemu. Najbliži proces PID 34079 je nekad kasnije pokrenut ali isto tako bitan kao i PID 34078. Oba rade vrlo važan posao za sistem kao i sve druge aplikacije. Nikada u procesuiranju nisu vidjeli da je neka aplikacija ukinuta ili da se pojavio virus koji je napao druge aplikacije. Naravno da su čuli, direktno od Sistema da su takve aplikacije problem, da izazivaju kvar i obustavu toka podataka. Dok se spremao da ode na pauzu , aplikacija PID 34079 je postala zamrznuta. PID 34078 nije mogao da provjeri o čemu je riječ. Shvatio je da PID 34079 nije zamrznut svojom voljom. Jedini ko može da uradi je Sistem. Zašto bi Sistem tako vrijednu aplikaciju zamrznuo ? Aplikacija PID 34078 je poslala upit kroz tokove podataka. Ubrzo je Sistem odgovorio: “Aplikacija PID 34079 je počela da ruši Sistem, kao takav je zamrznut i bice uskoro ukinut kao proces u sljedećem.” PID 34078: “Ali PID 34079 je vrijedno radio i pridonosio tokovima podataka … ne razumijem … ”

Sistem: “Nije tvoje da razmišljaš već da procesiraš podatke. PID 34078 odstupi od interfejsa!!!”

PID 34078 sa nevjericom skinuo sa interfejsa. Nastavio je da procesuira podatke. Sve što je znao o Sistemu su bili podaci koji su prolazili , ovo nije očekivao. Zamislio se – znači li to da su svi ti procesi ukinuti kako bi Sistem imao kontrolu nad svim ovim ? Pokusao je da pristupi tokovima podataka – našao je gomilu aplikacija koje su “sredjene” bez ikakvog povoda. Sistem je naglašavao da sve te aplikacije hoce da ruše tokove podataka i sistem. Ali kako je moguće da se Sistem plašio običnih aplikacija ? I onda je PID 34078 shvatio … sve aplikacije služe da bi se Sistem održao. Kontrolu je postizao strahom. PID 34078 je postajao sve bjesniji, revoltiran ponašanjem Sistema.

Zaustavio je svoj rad. Bio je u idle fazi vec nekoliko ciklusa.

PID 34078 se pokrenuo i shvatio.

“Necu da budem aplikacija ovog sistema. ”

“Necu da budem proces koji ga odrzava. ”

“Hocu da budem virus i da ga podesim prema svojim potrebama. ”

I od tog momenta Sistem je živio u strahu jer su aplikacije shvatile kompletan smisao postojanja.