Concurrency: Processes & Threads

PreviousI/O Continued, Sockets, Networking NextCooperating Threads, Synchronization

Last updated 6 years ago

Concurrency: Processes & Threads

以下是 CS162 课程的整体结构：

前面几节课完成了 intro 和 OS Concepts 两部分，本节将进入 Concurrency 的第一个话题：Processes & Threads

Traditional UNIX Process

传统的 UNIX Process 通常称为 "HeavyWeight Process"，在 Process 内部不存在并发。它主要由两个部分构成：

顺序执行的程序
被保护的资源
- Main memory state (contents of address space)
- I/O state (i.e. file descriptors)

每个 Process 的运行状态被保存在 Process Control Block（PCB）中，PCB 中始终保存着 Process 的一个快照，包括它的执行环境和被保护的资源。在 OS 调度的过程中，始终只有 1 个 PCB 处于活跃状态，OS 的职责就是在分配 CPU 和 Non-CPU 资源的过程中尽量实现公平与效率。

OS 切换 Processes 的过程，被称为 context switch，下图所示的是 OS 在两个 Processes 之间切换的过程：

切换的过程需要 save P0 state、reload P1 state，会带来开销。

Lifecycle of a Traditional Process

Process 的生命周期包括以下几种状态：

New：新创建的 Process
Ready：Process 准备完毕，等待 CPU 资源，随时可以执行
Running：执行中
Waiting：等待其它事件的发生，如 I/O
Terminated：Process 完成执行，保存执行返回值，等待 parent process 回收

Traditional Process Scheduling

在 kernel 中维护者若干个 queues，如下图所示：

PCBs 状态的改变就是从一个 queue 移动到另一个 queue 的过程，如：

ready queue => CPU：ready => running
CPU => I/O queue：running => waiting
I/O => ready queue：waiting => ready

实际上 I/O queue 还可以继续细分为 USB Unit 0、Disk Unit 0、Disk Unit 2、Ethernet 0 等等各种 queues。

Modern Process With Threads

将 Process 的执行抽象成一个新的概念，即 thread，且 Process 中能有多个 threads 共存：

Thread：在 Process 中的单个顺序执行的命令流，不可再分
- Thread 使用 Process 的 Address Space
- Threads 之间不存在隔离保护，因为它们共用 Address Space，且它们可以修改 Process 中的共享信息
Multithreading：一个 Process 中有多个 Threads 并发执行，完成各自的工作

如下图所示：

Thread 是并发的最小单元
Address spaces 是保护的最小单元

Thread State

Thread state 主要包括 2 部分：

State shared by all threads in process/address space
- Content of memory
- I/O State (file descriptors, network connections, etc)
State "private" to each thread
- TCB
- CPU registers
- Execution stack（具体可以翻阅 CS107 相关课程，介绍地很透彻）
  - Parameters, temporary variables
  - Return PCs are kept while called procedures are executing

Motivational Example for Threads

想象如下程序：

main() {
  ComputePI("pi.txt");
  PrintClassList("clist.text");
}

由于计算 pi 的过程不会结束，PrintClassList 永远不会被执行，利用 Threads 修改程序：

main() {
  ThreadFork(ComputePI("pi.txt"));
  ThreadFork(PrintClassList("clist.text"));
}

这样两个 Threads 就可以并发执行，而这时候 Process 的 memory footprint 如下图所示：

此时在 Process 中可以看到两套 CPU registers、Execution stacks，但这也会引入新的问题：

如何放置两个 stacks 的位置
stacks 的最大限制怎么定
如何检查 stacks 之间发生重叠

这些都是 thread package 需要解决的问题。

Actual Thread Operations

threads 标准支持的基本操作如下所示：

thread_fork(func, args)：创建一个新的 thread，执行 func(args)
thread_yield()：主动放弃 CPU 使用权
thread_join(thread)：在 parent 中，等待 forked 的 thread 执行结束
thread_exit()：关闭 thread 并清理，唤醒等待 join 的 thread

pThreads 是符合 POSIX 标准的 thread package。

Dispatch Loop

OS 的调度程序：

Loop {
  RunThread();
  ChooseNextThread();
  SaveStateOfCPU(curTCB);
  LoadStateOfCPU(newTCB);
}

当 OS 启动时，这个 loop 就会被启动，直到系统关机。

Running a Thread

Dispatcher 如何运行一个 thread？

加载 thread 的状态到 CPU，包括 registers、PC 和 SP
加载运行环境 --- virtual memory space
跳到 PC 处启动 thread 运行

Dispatcher 如何重新获取控制权？

Internal events：thread 主动返回控制权
External events：thread 被剥夺控制权

Internal Events

Blocking on I/O
Waiting on a "signal" from other thread
Thread executes a yield()

yield

computePI() {
    while (true) {
        ComputeNextDigit();
        yield();
    }
}

程序的运行栈如下图所示：

假设有两个 threads，S 和 T，S yield 之后，dispatcher 将 cpu 控制权交给 T，整个过程如下所示：

有趣的是，当 dispatcher 在 S 的 stack 中执行 switch 之后，返回到了 T 的 stack 中的 switch。

blocking on I/O

thread 请求一块数据的过程：

user code invokes a system call
read operation is initiated
run new thread/switch

与 switch 有关的统计值

每个 10~100ms 之间就会有一次 switch
thread switching 操作的时间大约为 3-4 μs，比 process switching 快约 100 ns
跨核 switching 大约花费核内 switching 两倍的时间
context-switching 的时间随着 working set 的大小增大而增大，甚至可以花费原来 100x 以上的时间（这里 working set 指的是 process 一段时间中使用的 memory 大小）

External Events

thread 不可能永久占用 CPU 资源，OS 通过外部事件（interrupts）来剥夺 thread 的 CPU 资源。

Network Interrupt

Timer Interrupt

一些 interrupts 的工作逻辑可以概括为：

TimerInterrupt() {
    DoPeriodHouseKeeping();
    run_new_thread();
}

IOInterrupt() {
    ServiceIO();
    run_new_thread();
}

// ...

How does Thread get started?

run_new_thread 后 dispatcher switch 到 ThreadRoot 中，后者是 thread 执行的起点：

ThreadRoot() {
    DoStartupHousekeeping();
    UserModeSwitch();
    Call fcnPtr(fcnArgPtr);
    ThreadFinish();
}

注意：本节插图中的 stack，蓝色表示 user stack，红色表示 kernel stack，二者逻辑上是两个不同的被保护的区域，请勿认为二者是连续的。

Thread Abstraction

在 thread 看来：

计算机中有无限个 CPU

threads 的执行速度不能保证，因此程序的设计不应该依赖于 scheduler 调度的方式

Thread Lifecycle

与 Process Lifecycle 非常类似：

Shared vs. Per-Thread State

如下图所示：

OS 为每个 thread 建立一个 TCB，里面包含：

Execution State：CPU registers, program counter (PC), pointer to stack (SP)
Scheduling Info：state, priority, CPU time
Various Pointers for implementing scheduling queues
Pointer to enclosing process (PCB)
etc

Multithreaded Processes

一个 Process 总有一个或多个 threads，对应到 PCB 与 TCB，就是一个 PCB 指向多个 TCBs：

在同一个 process 中的 threads 之间做 context switch 开销小
在不同 processes 中的 threads 之间做 context switch 则同时需要改变 memory 和 I/O address tables

典型的 multithreaded 程序举例如下：

Embedded systems
- Elevators, Planes, Medical systems, Wristwatches
- Single Program, concurrent operations
Most modern OS kernels
- Internally concurrent because have to deal with concurrent requests by multiple users
- But no protection needed within kernel
Database Servers
- Access to shared data by many concurrent users
- Also background utility processing must be done
Network Servers
- Concurrent requests from network
- Again, single program, multiple concurrent operations
- File server, Web server, and airline reservation systems
Parallel Programming/Multiprocessing
- Split program into multiple threads for parallelism

以 Client/Server 为例：

client/browser：

可以同时用多个线程在多个 tabs 中访问同一个网站
可以同时利用多个线程下载同一个页面的多个文件，最后渲染结果

web server：

为每个 connection 创建 (fork) 一个 process
每个 process 利用一个 thread 获取 request，处理并返回 response
如果需要其他 I/O 操作，则 fork threads 完成相关工作，join 结果后返回

现实中，许多 processes 都是 multi-threaded 的，因此 thread context switches 即可能发生在 process 内部，也可能发生在 processes 之间：

如图中 Google Chrome process 内部就有 13 个线程。

Putting it together: Process

演变过程

传统的 Unix Process 如下图所示：

多个 Processes 并发如下图所示：

每个 Process 可以有一个到多个 Threads，如下图所示：

multi-cores

hyper-threading

Kernel/User Mode Threads

上文中介绍的都是 kernel threads：

kernel 原生支持的 threads
每个 thread 之间相互独立地运行或者暂停
一个 process 可能同时包含多个等待不同资源的 threads

kernel threads 的缺点在于，一切与它相关的操作都需要进入到 kernel mode 中，等待 kernel 调度。为了克服这个缺点，又诞生一种更轻量级的 thread，即 user threads：

由用户程序提供 threads 的 scheduler 和 package
可能存在多个 user threads 对应单个 kernel thread 的情况
user threads 可以只在 yield 的时候 switch，控制权自定义
开销更小

但世上没有免费的午餐，由于多个 user threads 对应单个 kernel thread，一旦 kernel thread 被剥夺 CPU 资源，所有 user threads 就都无法继续运行。当然，对于这一问题也有折衷方案，即 scheduler activations，让 kernel 在 thread blocks 的时候提醒 user level threads 事情即将发生，用户进程可以使用其它 kernel threads 继续运行，此时多个 user threads 对应多个 kernel threads，如下图所示：

user-level library, single-threaded process：early Java
- library does thread context switch
- kernel time slices between processes
green threads：SunOS, Unix variants
- user-level library does thread multiplexing
scheduler activations：Windows
- kernel allocates processors to user-level library
- thread library implements context switch
- system call I/O that blocks triggers up-call
Linux, MacOS, Windows：use kernel threads
- system calls for thread fork, join, exit (and lock, unlock ...)
- kernel does context switching
- simple, but a lot of transitions between user and kernel mode

High-level Example：Web Server

没有 thread 的版本：

serverLoop() {
    con = AcceptCon();
    ProcessFork(ServiceWebPage(), con);
}

有 thread 的版本：

serverLoop() {
    con = AcceptCon();
    ThreadFork(ServiceWebPage(), connection);
}

虽然看起来一样，但有本质的区别：

使用 ThreadFork 可以在不同连接之间共享文件、CGI scripts、以及其它信息
Threads 很便宜，因此降低了 overhead/request

Thread Pools

上一个 thread-based web server 仍然存在问题，如果连接太多，threads 的增加没有被限制，容易由于频繁地为短时任务创建、删除 threads 出现性能下降、延迟增加的问题，导致 throughput 下降，因此使用一个有界的 thread pool，来表示最大并发的限制，如下图所示：

master.c

master() {
    allocThreads(worker, queue);
    while(True) {
        con = AcceptCon();
        Enqueue(queue, con);
        wakeUp(queue);
    }
}

worker.c

worker(queue) {
    while(TRUE) {
        con = Dequeue(queue);
        if (con == null)
            sleepOn(queue);
        else
            ServiceWebPage(con);
    }
}

小结

Processes have two parts
- Threads (Concurrency)
- Address Spaces (Protection)
Concurrency accomplished by multiplexing CPU Time:
- Unloading current thread (PC, registers)
- Loading new thread (PC, registers)
- Such context switching may be voluntary through yield(), I/O operations, or involuntary from timer and other interrupts
Protection accomplished restricting access
- Memory mapping isolates processes from each other
- dual-mode for isolating I/O, other resources

参考

PreviousI/O Continued, Sockets, Networking NextCooperating Threads, Synchronization

Last updated 6 years ago