PTPmesh 2017

DC Network Latency Measurements Using PTP

Posted by Yiran on May 12, 2019

核心思想

使用已有的PTP(IEEE 1588 Precision Time Protocol)进行数据中心网络latency和丢包率测量

PTP

one-way delay: \(1/2 * (master\_to\_slave\_delay + slave\_to\_master\_delay)\)

master-to-slave-delay: $ T1-T1^{’}$

slave-to-master-delay: $ T2-T2^{’}$

The time difference between the master and slave clocks represents the clock offset from master and is computed as a difference between the master-to-slave delay and the one-way delay.

PTPmesh

尽管 PTP-enabled NICs 更准确,本文聚焦于开源的软件实现,PTPd.

  • Network latency

    1. Once the two clocks synchronize (an initial starting period of 15 minutes is excluded from the plotted data for PTPd), the one-way delay reported is stable.

    2. If there are several congestion episodes before the slave clock manages to resynchronize to the master clock, the one-way delay reported by the slave will not be indicative of the actual delay, although it will indicate that there is an event (congestion, failure) on that network path.

    1. The clock offset, master-to-slave and slave-to master delays oscillate between larger values when the Sync and Delay Request interval is larger.

  • Packet loss

    PTPd records the number of messages sent and received (Announce, Sync, Followup, Delay Request, Delay Response), and it is possible to export them periodically. On the slave side, a difference between the number of Delay Request and Delay Response messages would indicate packet loss:

    \[1- \frac{\#Delay\_Response\_messages}{\#Delay\_Request\_messages}\]

两个测量都只是coarse的estimate

Deployment

  • Cloud Provider: Hypervisor

  • Tenants: Inside VM

总结对比

参考文献

PTPmesh: Data Center Network Latency Measurements Using PTP