FAST 2020 摘要概览
(18th USENIX Conference on File and Storage Technologies)
https://www.usenix.org/conference/fast20/technical-sessions
Performance
[2020/03/02*] An Empirical Guide to the Behavior and Use of Scalable Persistent Memory
[2020/03/02] DC-Store: Eliminating Noisy Neighbor Containers using Deterministic I/O Performance and Resource Isolation
[2020/03/02] GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage
Key Value Storage
[2020/02/29] Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook
[2020/03/01*] FPGA-Accelerated Compactions for LSM-based Key-Value Store
[2020/03/01*] HotRing: A Hotspot-Aware In-Memory Key-Value Store
Caching
[2020/03/03*] BCW: Buffer-Controlled Writes to HDDs for SSD-HDD Hybrid Storage Server
[2020/03/05] InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache
[2020/03/07] Quiver: An Informed Storage Cache for Deep Learning
Consistency and Reliability
[2020/02/29] CRaft: An Erasure-coding-supported Version of Raft for Reducing Storage Cost and Network Cost
[2020/02/29*] Hybrid Data Reliability for Emerging Key-Value Storage Devices
[2020/02/27] Strong and Efficient Consistency with Consistency-Aware Durability
Consistency and Reliability
[2020/02/29] CRaft: An Erasure-coding-supported Version of Raft for Reducing Storage Cost and Network Cost
Authors: Zizhong Wang, Tongliang Li, Haixia Wang, Airan Shao, Yunren Bai, Shangming Cai, Zihan Xu, and Dongsheng Wang, Tsinghua University
Abstract: Consensus protocols can provide highly reliable and available distributed services. In these protocols, log entries are completely replicated
to all servers. This complete-entry replication causes high storage and network costs, which harms performance.
Erasure coding is a common technique to reduce storage and network costs while keeping the same fault tolerance ability. If the complete-entry replication in consensus protocols can be replaced with an erasure coding replication, storage and network costs can be greatly reduced. RS-Paxos is the first consensus protocol to support erasure-coded data, but it has much poorer availability compared to commonly used consensus protocols, like Paxos and Raft. We point out RS-Paxos's liveness problem and try to solve it. Based on Raft, we present a new protocol, CRaft. Providing two different replication methods, CRaft can use erasure coding to save storage and network costs like RS-Paxos, while it also keeps the same liveness as Raft.
To demonstrate the benefits of our protocols, we built a key-value store based on CRaft, and evaluated it. In our experiments, CRaft could save 66% of storage, reach a 250% improvement on write throughput and reduce 60.8% of write latency compared to original Raft.
备忘录
共识协议在分布式服务中提供了高可靠性和可用性。 在这些协议中,日志条目被完全复制到所有服务器。 这种完整的复制会增加存储和网络成本,并对性能产生不利影响。
erasure code是一种在保持容错能力的同时降低存储和网络成本的常用技术。 如果用erasure code代替日志条目的完全复制,则可以大大降低存储和网络成本。
RS-Paxos是第一个支持erasure code数据的共识协议,但是它比常用的共识算法(例如Paxos和Raft)可用的要少得多。 作者指出了RS-Paxos的活动性问题,并开始解决它。 基于Raft,作者提出了一种新协议CRaft。 CRaft保持与Raft相同的活动性,但对RS-Paxos使用擦除编码以减少存储和网络成本。
为了演示该协议的好处,我们创建了一个基于CRaft的键值存储并对其进行了评估。 实验表明,与原始Raft相比,CRaft提高了写入吞吐量和延迟。
[2020/02/29*] Hybrid Data Reliability for Emerging Key-Value Storage Devices
Authors: Rekha Pitchumani and Yang-suk Kee, Memory Solutions Lab, Samsung Semiconductor Inc.
Abstract: Rapid growth in data storage technologies created the modern data-driven world. Modern workloads and application have influenced the
evolution of storage devices from simple block devices to more intelligent object devices. Emerging, next-generation Key-Value (KV) storage devices allow
storage and retrieval of variable-length user data directly onto the devices and can be addressed by user-desired variable-length keys. Traditional reliability schemes for multiple block storage devices, such as Redundant Array of Independent Disks (RAID), have been around for a long time and used by most systems with multiple devices.
Now, the question arises as to what an equivalent for such emerging object devices would look like, and how it would compare against the traditional
mechanism. In this paper, we present Key-Value Multi-Device (KVMD), a hybrid data reliability manager that employs a variety of reliability techniques with different trade-offs, for key-value devices. We present three stateless reliability techniques suitable for variable length values, and evaluate the
hybrid data reliability mechanism employing these techniques using KV SSDs from Samsung. Our evaluation shows that, compared to Linux mdadm-based RAID throughput degradation for block devices, data reliability for KV devices can be achieved at a comparable or lower throughput degradation. In addition, the KV API enables much quicker rebuild and recovery of failed devices, and also allows for both hybrid reliability configuration set automatically based on, say, value sizes, and custom per-object reliability configuration for user data.
备忘录
数据存储技术的迅速发展创造了一个现代的数据驱动世界。 现代工作负载和应用程序已经影响了存储设备,从简单的块设备演变为更智能的对象设备。
下一代键值存储设备允许将可变长度的用户数据直接存储在设备上并从中检索数据,这些数据可以由用户所需的可变长度密钥指定。
用于多块存储的经典可靠性保证技术(例如RAID)已经使用了很多年。 那么在下一代对象设备中它是什么样的呢? 与经典机制有何不同?
在本文中,我们提出了Key-Value Multi-Device (KVMD)。 KVMD是一种混合数据可靠性管理器,它对键值设备采用了各种具有不同权衡取舍的可靠性技术。 本文介绍了三种适用于可变长度数据的无状态可靠性技术,并通过三星的KV SSD使用这些技术评估了混合数据可靠性机制(hybrid data reliability mechanism )。
我们的评估表明,与块设备基于Linux mdadm的RAID吞吐量下降相比,KV设备的数据可靠性可以在相当或更低的吞吐量下降下实现。 此外,KV API可以更快地重建和恢复故障设备,并且还可以基于值大小自动设置混合可靠性配置,还可以为用户数据自定义按对象可靠性配置。
杂记
[2020/02/27] Strong and Efficient Consistency with Consistency-Aware Durability
Authors: Aishwarya Ganesan, Ramnatthan Alagappan, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau, University of Wisconsin–Madison
Abstract: We introduce consistency-aware durability or CAD, a new approach to durability in distributed storage that enables strong consistency while
delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic reads, a novel and strong consistency property that provides monotonic reads across failures and sessions in leader-based systems. We build ORCA, a modified version of ZooKeeper that implements CAD and cross-client monotonic reads. We experimentally show that ORCA provides strong consistency while closely matching the performance of weakly consistent ZooKeeper. Compared to strongly consistent ZooKeeper, ORCA provides significantly higher throughput (1.8 – 3.3×), and notably reduces latency, sometimes by an order of magnitude in geo-distributed settings.
备忘录
- Consistency-Aware Durability (CAD) 提案。 CAD是提高分布式系统耐用性的一种新方法,可在实现高性能的同时仍提供强大的一致性。
- 我们通过设计跨客户端单调读取来证明此方法的有效性,这种新颖而强大的一致性属性可在基于领导者的系统中跨故障和会话提供单调读取(monotonic reads)。
- 我修改了ZooKeeper,以创建具有CAD和跨客户端单调读取(cross-client monotonic reads)的ORCA。 ORCA通过实验证明,它在提供强大一致性的同时,提供的性能与ZooKeeper的性能(弱一致性)相近。 与高度一致的ZooKeeper相比,ORCA提供了更高的吞吐量(1.8-3.3倍)和更低的延迟。
- 不理解这段什么意思 “sometimes by an order of magnitude in geo-distributed settings.” 有时在地理分布环境中的数量级??
杂记
- Apache ZooKeeper
- “strongly consistent ZooKeeper” 和 “weakly consistent ZooKeeper”
这是否意味着ZooKeeper提供了一致性模型选择? ??