Processes and threads are used to allocate computer resources in an optimal manner.
Process
- A process is a unit for OS to allocate computing resources
- Each application has at least one process
- A process is the basic unit for applications to compete for computing resources
If a computer only has one core, there is only one process at one time.
However CPU is able to switch from different application processes,yet it’s very costly for the system.
A good OS needs to find the balance of switching enough applications without consuming too much resource.
Thread
- A thread is part of a process
- Each process may have one or more threads
- As CPU is getting faster and faster, using only process to manage computing resources is not fine-grained enough.
- A thread is a even smaller unit competing for the computing resources
- It costs much less to switch applications with thread than with process
- A thread uses CPU to execute the code
Multi-threading
For single thread, codes are executed linearly, e.g.
import threading
import time
def worker():
t = threading.current_thread()
time.sleep(5)
print(t.getName())
worker()
t = threading.current_thread()
print(t.getName())
It will execute worker, sleep for 5s, print “MainThread” then execute t, print “MainThread” again.
For multi-threads, codes are executed at the same time.
import threading
import time
def worker():
t1 = threading.current_thread()
time.sleep(5)
print(t1.getName())
worker()
new_t = threading.Thread(target=worker, name="NewThread")
new_t.start()
t = threading.current_thread()
print(t.getName())
It will print “MainThread” first, wait for 5s then print “NewThread”
- The key benefit of multi-threading is to fully use CPU’s capacity
- It’s a type of asynchronous programming
Benefits of Multi-Threading
- For multi-core computers, each core can run a thread
- Multi-threading is able to run programs in parallel
- However, Python is not capable of fully enjoying the benefits of multi-core CPU due to GIL
Global Interpreter Lock
- Due to GIL, python can only perform single-core single-thread at one time
So, why GIL is needed?
- It’s for the thread security
- A single process has multiple threads, which can access and share the same resources for that process, causing thread insecurity
An example:
In thread A
- a = 3
- a += 1
- print a
In thread B
- a += 1
- print a
Since both threads have access to a and change its value, it’s very possible that a +=1
is not finished
before thread B is switched on, hence its result is not certain here
As a result, Lock is introduced for certain var’s value and operation, only one thread is needed at one time
There are two types of locks:
- “Fine-Grained” lock: lock customized and added by the coder
- “Coarse-Grained” lock: GIS, which guarantees thread security to some extent
Also a note:
- GIL is supported by cpython, but not in jpython
However, Python can use multi-process with process communication tech to enjoy the benefits of the multi-core CPU,
yet the cost of switching between different processes is far more than that of switching different threads
IO Intensive vs. CPU Intensive
- For CPU intensive programs, e.g. ffmpeg, image processing etc. Python has its limitations on multi-threading
- For IO intensive programs, e.g. CRUD, web request, file I/O etc., Python can utilize the wait time for multi-threading
- Most of the common programs are IO intensive
Issues with multi-threading in Flask
- The relationship between sending and processing requests
- Flask’s own webserver is single-process single-thread
- For a multi-threaded request, it’s unclear which Request object is used at the moment. If there are some changes to be made on certain request,
it can’t be guaranteed that the targeted request instance is changed, which results in contaminated data.
LocalThread
- To solve the issues of multi-threading in Flask we can use dictionary data structure, e.g.
request = {thread_id1: Request1, thread_id2: Request2, ...}
- Flask uses
werkzeug
Local class
import threading
import time
class A:
b = 1
my_obj = A()
def worker():
my_obj.b = 2
new_t = threading.Thread(target=worker, name="NewThread")
new_t.start()
time.sleep(1)
print(my_obj.b)
In the example above, class A is not a local thread, hence the result of above code is 2
However if we use Local
import threading
import time
from werkzeug.local import Local
my_obj = Local()
my_obj.b = 1
def worker():
my_obj.b = 2
print("in new thread b is: " + str(my_obj.b))
new_t = threading.Thread(target=worker, name="NewThread")
new_t.start()
time.sleep(1)
print("in main thread b is: " + str(my_obj.b))
The print out result is
in new thread b is: 2
in main thread b is: 1
LocalStack
There are two local stacks in Flask
_app_ctx_stack
_request_ctx_stack
They are both LocalStack, which is a type of Local class
The relationship between LocalStack, Local and Dictionary
Local is a type of threading separator implemented with dict
LocalStack is a type of threading separator stack implemented with Local
It shows the encapsulation thought in OOP. If encapsulation once is not enough, do it one more time
- LocalStack <== Local <== Dict
The key for using LocalStack is for the current thread to point correctly to the object created by this thread,
instead of object created by the other thread