# Distributed Systems, Networking, TCP/IP, RPC (21,22)

## Distributed Systems

### Centralized vs. Distributed Systems

![centralized](/files/-Lr6l_LWEDAeR3eaXmTA)

Centralized System 中，绝大部分功能都在一台物理机器上完成。最开始连客户端也在同一台机器上，后来逐渐演化成 C/S 模型。

![Distributed](/files/-Lr6lwkASTWB5BMFiK5h)

Distributed System 中，不同计算机通过合作的方式完成一项任务。早期以同机房下的多台机器组成的集群形式为主，后来逐渐演化为 peer-to-peer/wide-spread collaboration。

### Motivation/Issue

#### Why do we want distributed systems?

* 造许多简单的计算机与造一台复杂计算机相比，前者成本低且难度小
* 容易逐渐扩容和缩容
* 用户可以对部分机器拥有完全控制权
* 方便更多的用户通过网络合作

#### The promise of distributed systems

* Higher availability：一台机器挂了，用另一台
* Better durability：复制数据
* More security：更小的粒度上进行安全控制

理想很丰满，现实很骨感。现实是：

* Worse availability：availability 有赖于所有正在运行的机器，任何服务挂了都有可能导致服务不可用。Lamport 曾说：“a distributed system is one where I can't do work because some machine I've never heard of isn't working!”
* Worse reliability：机器崩溃可能导致数据丢失
* Worse security：世界上任何人都可以入侵到系统中

除此之外，随着机器数量增加，正确、合理地调度所有机器 (coordination) 来共同完成任务变得异常困难。

> What would be easy in a centralized system becomes a lot more difficult

#### Goals & Requirements

如果只用一个词概括 Distributed System 的目标，那就是 **Transparency**。

> The ability of the system to mask its complexity behind a simple interface

这里的 transparency 又包括：

* Location：无需知道资源的位置
* Migration：资源在用户无感知的情况下可能被合理移动
* Replication：无需知道资源有多少份副本
* Concurrency：无需关心有多少用户正在使用资源
* Parallelism：系统可以通过将大任务拆分成小任务，再通过并行的方式加速任务运行
* Fault Tolerance：系统中出现少量问题用户无感知

## Networking

### Protocol

Distributed system 中，分布在不同机器上的不同进程间需要通过某种方式来交流，我们可以用人类之间交流使用的不同的语言来类比。在计算机范畴中，我们将这些方式称为 protocol。Protocol 包括两部分：

* Syntax：信息的识别和构成，包括格式、信息展示的顺序等等
* Semantics：信息的含义

对应语言，Syntax 就是语法，Semantics 就是语义。通常， protocol 可以使用状态机 (state machine) 来表示。

### *剩余话题概括*

*这里忽略网络部分的剩余讨论，该部分内容与网络原理、分布式系统等课程内容有交叉，话题包括：*

* Client/Server vs. Peer-to-Peer
* Network Protocols
  * Broadcast (Aloha network)
  * Carrier Sense, Multiple Access/Collision Detection (CSMA/CD)
  * Point-to-point
  * The Internet Protocol (IP)
    * Address Subnets
    * Address Ranges
    * Hierarchical Networking
    * Routing/Routing Tables
    * DNS
* Network Layering
* TCP/IP
  * ordering
  * reliable delivery (exactly once)
  * congestion avoidance
  * sequence number
* Sockets
* Distributed Decision Making
  * General's Paradox
  * [2PC](/open-courses/mit-6.824/2pc-and-3pc.md)
  * Byzantine General's Problem
* RPC & Microkernel operating systems

## 参考

lecture note [21](https://people.eecs.berkeley.edu/~kubitron/courses/cs162-S15/sp15/static/lectures/21.pdf), [22](https://people.eecs.berkeley.edu/~kubitron/courses/cs162-S15/sp15/static/lectures/22.pdf)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zhenghe.gitbook.io/open-courses/ucb-cs162/distributed-systems-networking-tcp-ip-rpc-21-22.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
