Discover more from Blinking Cursor ▌ by daniele margutti
Task & Threads in Swift Concurrency
The new Swift Concurrency model added the concept of Task in order to better manage asynchronous code execution. But what’s represent a Task and how it’s related to Threads?
To understand the difference between tasks and threads, we need to take a look at what threads are and how they are used in a modern general-purpose computer.
Since this is not a post about CPU architectures I’ll try to simplify some concepts.
At the very low level every processor has a thing called registers.
We can refer to registers as a type of computer memory built directly into the processor that is used to store and manipulate data during the execution of a program instructions.
A register is just like container for a variable; modern processors uses 64-bit registers, which means a single unit can store 64 bits of data (8 bytes).
When a CPU executes a program it loads data from RAM into register, does operations and store them back to the memory.
A modern CPU has multiple cores, each with one set of registers including a program counter. Some CPUs have two sets of registers per core for faster thread switching (we call it hyper-threading).
For sake of simplicity a core is an entire CPU that just happens to be attached to the first one for faster communication.
There are different types of registers1: unfortunately while we’re allowed to have a large amount of local variables, there are a limited amount of registers into a CPU.
To handle this issue there is a region of the memory (RAM) used to store local variables that aren’t currently in registers.
This region is called Stack: each time you call a function its variables get added to the top of the list; once a function body is finished their variables are taken off the top (the concept of a stack).
Threads are a sequence of execution of code which can be executed independently of one another. You can consider it as the smallest unit of operations that can be executing by an operating system.
At the very low level a thread is represented as a snapshot of the CPU, specifically the state of all the registers, the content of the stack and few other stuff used by the kernel to keep track of the thread.
Modern CPUs runs multi-tasking OSs which allows to switch which thread is running at frequent intervals allowing the concurrency happens.
Each thread is executed for a limited amount of time, then a context switch happens and so on (in a multi-core environment, concurrency can be achieved via parallelism in which multiple tasks are executed simultaneously if they are independent each others).
Context switching can be:
Voluntary: the thread has completed its work and reports it to the kernel" (cooperative multitasking).
Non-Voluntary: the scheduled time for the thread has ended for now, and execution is temporarily suspended by the kernel (preemptively multitasking).
Preemptive multitasking allows for better resource management and prevents poorly-behaved programs from taking full control of the system (more on differences2).
The hidden cost of threads
The context switching doesn't come for free; in fact, it has a cost.
When the OSs suspend a thread it must save all the registers into the RAM, then load the state of the incoming thread previously saved on RAM.
The load/unload operations of threads take time and memory. Moreover, the kernel needs to keep track of these threads to resume or suspend them, which means additional memory usage.
Keep in mind that significant progress has been made in processor and operating system architectures. Today, a thread can share various state information with other threads, reducing overhead. Most processors, especially multi-core processors and GPUs (graphics processing units), now incorporate hardware that makes running multiple threads particularly efficient.
Nevertheless, it's important to consider that thread models may not be well-suited for all computer architectures, and this remains an optimization problem.
The new Swift Concurrency Model uses an hybrid approach: it has a lightweight thread-like object called Task (in other languages they call it coroutines or fibers or, simply, program threads - vs kernel threads).
OS’s Kernel know nothing about these objects. It’s fully managed by the internal Swift’s concurrency library. You can consider it a new abstraction layer over the OS.
The Concurrency Library does its own cooperative multitasking to decide which tasks needs to be mapped to real threads. Every time we’re using the await statement, we’re telling to the library it’s someone els’s turn, giving up your actual thread.
This a model which maps M user threads onto N kernel threads.
This enables a large number (M) of user threads to be created, due to their light weight, which still allowing (N-way) parallelism.
Many other languages uses this model to create lightweight threads4.
The advantage over using one thread for each task (kernel only threading) is that you consume less resources, like memory (both virtual and physical) and kernel objects. You also get less context switches, which increases performance (in the ideal case, where you have as many running threads as you have processors, you may have almost no context switches).
The advantage over user only threading is that you can take advantage of multiple CPUs or multiple CPU cores. And if one task blocks, you can create another kernel thread to use the available CPU more efficiently.
A disadvantage over kernel only scheduling is possibly bigger latency: if all the threads in the pool are busy and you add new short task, you may wait a long time before it starts executing.
In the best case scenario that what previously would have required many many and expensive real threads, each running for a short period of time, now requires only a tiny number of real threads.
Moreover they all run for as long as the kernel will let them, minimizing both memory overhead and therefore the switching cost.
Since this topic is somewhat off-topic, I will briefly mention the concept of thread explosion, leaving links to two articles that delve into the issue at the end.
As we said context switch has a performance cost. When this happens a lot (you create manu more threads compared to the number of available CPU cores) the context switching grows high: this event is called threads explosion.
This lead to some issues like:
Memory overhead: each blocked thread is holding onto valuable memory and resources while waiting to run again.
Scheduling overhead - as new threads are brought up, the CPU need to perform a full thread context switch in order to switch away from the old thread to start executing the new thread. With limited cores and a lot of threads, the scheduling latencies of these threads outweigh the amount of useful work they would do, therefore, resulting in the CPU running less efficiently as well.
While Swift Concurrency is doing a pretty good job of preventing thread explosion, we cannot deny the fact that it will cause a very significant bottleneck in some cases.
If you want to know more about this topic you can watch the WWDC21 Session “Behind the scenes” and read this interesting article by Senpai “Swift Concurrency vs Thread Explosion”.
If you liked this post consider subscribing to the newsletter using the form below!
Blinking Cursor ▌ by daniele margutti is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
The vast majority of computer processor has the following registers:
PC (Program Counter) which keeps track of the memory address of the next instruction to be fetched and executed.
IR (Instruction Register) which holds the currently fetched instruction being executed.
ACC (Accumulator) a general purpose register used to store intermediate results during arithmetic/logical operations.
General Purpose Registers (R0, R1…) used by the programmer to store data during calculations.
AR (Address Registers) store memory addresses for data access or for transferring data between different memory locations.
SP (Stack Pointer) it contains the pointer to the top of the stack, a region of memory used for temporary storage during function calls and operations.
DR (Data Registers) store data fetched from memory or from I/O
SR (Status Registers) contains individual bits that indicate the outcome of operation (carry, overflow, zero results…)
CR (Control Registers) manage various control settings and parameters related to the CPU's operation.