python

Most of the developers do not use cached_property and lru_cache from functools standard library but also does not cache HTTP request/response into outside file/database. Example in this article are tested under Python 3.8

Usage functools.cached_property

Let say you have an intensive calculation. It takes time and CPU usage. It happens all the time. There is a need to calculate some values for webshop each time the client access site. Example usage of cached_property:

from functools import cached_property
import statistics
from time import time

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)


numbers = range(1,10000)
testDataSet = DataSet(numbers)

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = statistics.stdev(numbers)
result = statistics.variance(numbers)
end = time()
print(f"RAW run: {(end - start):.6f} second")

Output would look similar to this:

First run: 0.247226 second
Second run: 0.000002 second
RAW run: 0.242232 second

You can run code online: Python code example IDE Online

Usage functools.lru_cache

lru_cache is a decorator that is used for function using memoizing callable that saves up to the maxsize most recent calls. Again you have a lot of calculation and you want to save some results (the example we calculate N and N+1 we need just one step instead of re-calculating complete N+1) of early calculation that helps us to build next result with cached ones.

from functools import lru_cache
from time import time

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)


start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(39999)]
end = time()
print(f"Third run: {(end - start):.6f} second")


start = time()
result = [fib(n) for n in range(40001)]
end = time()
print(f"Fourth run: {(end - start):.6f} second")
print(fib.cache_info())

Output would be:

First run: 0.278697 second
Second run: 0.017155 second
Third run: 0.017530 second
Fourth run: 0.065415 second
CacheInfo(hits=199997, misses=40001, maxsize=None, currsize=40001)

The first call is cached. The second one is re-using cache, the third one is N-1 and the fourth is N+1.

As we can see in the last 3 cases - we re-use cache. This could be used for database, calculation, any CPU usage that we want to repeat or operation we want to keep in cache.

Here is an online IDE you can run and view: lru_cache example

HTTP request caching

With lru_cache we could also cache web requests for static pages. Other options are to keep the result in the file based on our input data.

Let us see first options:

from functools import lru_cache
import urllib.request
from time import time

@lru_cache(maxsize=32)
def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    try:
        with urllib.request.urlopen(resource) as s:
            return s.read()
    except urllib.error.HTTPError:
        return 'Not Found'
start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))
end = time()
print(f"First run: {(end - start):.6f} second")

print(get_pep.cache_info())     

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))     
end = time()
print(f"Second run: {(end - start):.6f} second")

print(get_pep.cache_info())

If we run this code, we get:

First run: 0.897728 second
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)

Second run: 0.000026 second
CacheInfo(hits=14, misses=8, maxsize=32, currsize=8)

You can run this code: HTTP Caching

Now let us talk about real projects in real life. You have IP or word and you need to check or to get a replacement. But you have 2^32-1 IP or 50 million words. And you don't want to lose all information you got from these services. But caching inside of python is not enough for this. So what are we going to do? We put the result in a file or database.

Example code:

import urllib.request
from time import time

def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    f = ""
    ff = ""
    try:
       f = open(str(num),"r")
       txt_file = f.read()       
       return txt_file
       # Do something with the file
    except IOError:
       nothing = "a"

    try:
        with urllib.request.urlopen(resource) as s:
            ff = open(str(num),"w+")
            txt = s.read()
            ff.write(str(txt))
            return txt
    except urllib.error.HTTPError:
        return 'Not Found'


start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"First run: {(end - start):.6f} second")

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"Second run: {(end - start):.6f} second")

You can run code caching results from http This code produce something similar to:

First run: 4.196623 second
Second run: 0.358382 second

Why is this better ? in short: if you have 20 million keys, words, something and you run day by day - then it is better to keep in database or files. This example (file, writing to file) is the simplest proof of concept. I am lazy to implement MySQL, PostgreSQL, or SQLite records to keep.

Why is it so important for the next decade and to the development of the internet?

Serverless framework logo

Let first define what is "serverless" and then describe influence on how the development team, DevOps, and the rest of the company are impacted in a positive and negative way.

What is serverless is?

It is FaaS (function as a service). It literary means that you have one function for one HTTP mount point - and the best thing you don't need infrastructure at all. You upload your binary(for Go, Java) or source code (Python, Ruby) and run your app with some /mount_point.

Under the hub - on the AWS side (as for an example) there is a docker that runs your code and exports your interfaces to HTTP. Your code runs isolated, and not doing bad to infrastructure.

There is a provider for FaaS services: AWS, Microsoft Azure, Google Cloud, IBM Cloud, and others.

What programming language are used?

Node.js, Go, Python, Ruby, Java. There is a serverless framework called "serverless" (there is more but this one is popular) Serveless framework site

You can run on the local machine, test, and deploy to AWS Lambda. It uses some IP/this_url as mount point where you can run GET, POST. Behind that is your code run by a cloud of 1500+ CPU and provide fast execution. You can connect with some database or other services (like S3).

Good things with serveless

Price - you can run 5 million execution (with minimum 128MB and under 1 sec) for 5$

Easy development - you can split your project between a different kind of developer py, ruby, nodejs, and with different Http mount points in your URL. This brings clean code, less with messy 10K lines.

Easy deployment - as part of CI/CD (Continues Integration and Continues delivery) you can connect inside of your pipeline and keep new and old code inside of codebase. Also before releasing to production, you can run tests to check if works properly.

DevOps or Developer - the good thing is you can use your developer to do all this job. In some larger cases, you need DevOps (that is 1% company in the world)

One-person-site for millions of people - yep, it could be easy to scale your web site up to million queries per day.

Microservices - Why not? You can split your web app to parts you want and run as you want.

Bad things with serveless

A new way of looking and doing things - so you move from server infrastructure to FaaS. Now you don't need sysadmin. That is great but who is the main person for troubleshooting? Developers? This is the bad side - you need to work on proper test cases, TDD. And to set time and resources (developers or DevOps) to troubleshoot this.

The new organization of the team and workflow - it could be lead to a rejection of serverless and stress to complete the team. You need to bride them with playrooms, drinks, and snacks, travel tickets, cinema tickets, or free days.

Warm startup - There is a cold start down for serverless. If you don't use some of your function it brings down docker. So next query it would take longer for "warm-up" start. This is solved by the warm-up plugin. I am pointing this because a lot of new people running serverless don't know about this. URL keep your lambdas warm

Next time on topic serverless - I am going to put golang code as an example of how to run this.

Why is important to have this?

From technical aspect is to easily deploy for a small company and produce big solutions. On an economic level - it would help medium and low development countries (like Bosnia and Herz) to push easy solutions for masses. You can for a small number of dollars to run millions of queries and don't think about bottlenecks, failure of infrastructure, etc. Having scale planed it brings win - clients would see the fast response and excellent delivery of services.

Vladimir Cicovic Blog

Tag: python

Faster python app