Distributed Systems, Networking, TCP/IP, RPC (21,22)

本节可以理解成是计算机网络原理课程的一个概括

Distributed Systems

Centralized System 中，绝大部分功能都在一台物理机器上完成。最开始连客户端也在同一台机器上，后来逐渐演化成 C/S 模型。

Distributed System 中，不同计算机通过合作的方式完成一项任务。早期以同机房下的多台机器组成的集群形式为主，后来逐渐演化为 peer-to-peer/wide-spread collaboration。

理想很丰满，现实很骨感。现实是：

Worse availability：availability 有赖于所有正在运行的机器，任何服务挂了都有可能导致服务不可用。Lamport 曾说：“a distributed system is one where I can't do work because some machine I've never heard of isn't working!”
Worse reliability：机器崩溃可能导致数据丢失
Worse security：世界上任何人都可以入侵到系统中

除此之外，随着机器数量增加，正确、合理地调度所有机器 (coordination) 来共同完成任务变得异常困难。

What would be easy in a centralized system becomes a lot more difficult

如果只用一个词概括 Distributed System 的目标，那就是 Transparency。

The ability of the system to mask its complexity behind a simple interface

这里的 transparency 又包括：

Distributed system 中，分布在不同机器上的不同进程间需要通过某种方式来交流，我们可以用人类之间交流使用的不同的语言来类比。在计算机范畴中，我们将这些方式称为 protocol。Protocol 包括两部分：

对应语言，Syntax 就是语法，Semantics 就是语义。通常， protocol 可以使用状态机 (state machine) 来表示。

这里忽略网络部分的剩余讨论，该部分内容与网络原理、分布式系统等课程内容有交叉，话题包括：

Client/Server vs. Peer-to-Peer
Network Protocols
- Broadcast (Aloha network)
- Carrier Sense, Multiple Access/Collision Detection (CSMA/CD)
- Point-to-point
- The Internet Protocol (IP)
  - Address Subnets
  - Address Ranges
  - Hierarchical Networking
  - Routing/Routing Tables
  - DNS
Network Layering
TCP/IP
- ordering
- reliable delivery (exactly once)
- congestion avoidance
- sequence number
Sockets
Distributed Decision Making
- General's Paradox
- 2PC
- Byzantine General's Problem
RPC & Microkernel operating systems

lecture note 21, 22

Last updated 6 years ago