The Attala Architecture
The Attala high-performance composable storage infrastructure is based on a scale-out fabric that leverages standard Ethernet networks to interconnect all servers and storage nodes in the data center. With full automation, the fabric dynamically, arbitrarily and securely attaches NVMe volumes from Ethernet-connected, server-less storage nodes directly to where the application lives – whether in a bare-metal server, in a virtual machine or a container. With a focus on scale-out cloud storage, the Attala team took a clean-sheet approach, eliminated legacy layers and leveraged their legacy - in creating high-performance hardware - to create a solution with up to eleven million IOPS per scale-out node and latencies as low as 15 micro seconds with a cost-efficient, server-less design. The team also created a new, holistic approach for performance assurance at scale. Hardware-based data-paths and queues - that are flawlessly predictable – are combined with hardware-based QoS controls and granular monitoring – that have zero performance perturbation – to create an end-to-end solution that enables workloads realize predictable low latency, IOPS and throughput while maximizing the efficient use of a shared infrastructure.
This architecture is built upon the following innovations:
100% Automation - Security, Performance & Resource Automation
Users/tenants and operators access the system via REST APIs or an Attala GUI. The APIs enable users to instantly and securely create, attach and re-attach persistent volumes for their workloads with support for seamless integration into common orchestration frameworks (Cinder, Kubernetes, etc). Admin portals allow operators to create policies, forecast capacity and receive alerts. The system is autonomous with resource allocation, error recovery, etc.
100% Fully-integrated QoS and Latency Monitoring
For performance assurance, the Attala solution provides a closed-loop approach with per-volume QoS-controls and per-volume performance monitoring (min, max latency, IOPS, throughput) – i.e. to both “enforce and verify”. Both users and operators can create performance alerts and use visual tools to look back in time to diagnose issues and eliminate or minimize painful time spent diagnosing performance anomalies, outages or brown-outs
100% Host Transparency via Attala Host Adapter
The Attala fabric includes an FPGA-based host PCIe adapter that does full hardware emulation of an NVMe SSD (or SSDs) such that the host OS, hypervisor or driver treat it as if it were an actual SSD using standard, in-box NVMe drivers which enables operation with any OS, any hypervisor. The solution also includes a mechanism to create these virtual SSDs - and their allocation to physical SSDs from the pool – using an out-of-band technique that enables both pre-boot and run-time provisioning. The approach requires zero host agents or software to dramatically simplify operator deployment and maintenance across multiple images.
For customers that wish to use standard NICs or RDMA NICs (versus the Atalla host adapter), the Attala solution also includes full solutions. Find out more.
100% Hardware based Data Transport and 100% server-less storage nodes
The Attala solution combines the efficiency and predictability of end-to-end FPGA/hardware data paths and queues with RDMA (remote direct memory access) technology and standard Ethernet networks. The result is the ability to transport data from the physical storage media across the network directly to the application with superior latency and predictable performance. The data nodes that house the physical SSDs also include the FPGA-based interface to the Attala fabric. These storage nodes are 100% server-free (no high-wattage X86 processor nor bulky DRAM) enabling lower costs, power and real-estate
100% Direct Access
Workloads running in bare-metal, virtual machines (VMs) or containers are able to directly access their attached volumes with low latency and predictable performance using virtual SSDs described above. For virtual machines and virtualized containers, SRIOV technology is used to export the virtual SSDs directly into the VM which bypasses the hypervisor’s storage layers to eliminate another layer of added latency, IOPS dilution and unpredictable performance.
100% Redundancy in Data and Data Paths
Pooling storage resources introduces the “all your eggs in one basket” concern. To address this, the Attala solution enables data protection either across data nodes (i.e. writing redundant data into different failure domains) or within a data node (redundant data across multiple SSDs). In either case, the Attala solution provides network link redundancy and in the latter case, full redundant IO paths including dual-port SSDs
100% Standards Based
To avoid lock-in and provide greatest flexibility and choice, Attala’s philosophy is to avoid any proprietary interfaces between the components of the system. Across the network (between hosts and data nodes), the solution is fully interoperable using the NVMe-over-fabrics, RoCEv2, IP and Ethernet standards – which allows use of other host adapters. The interface between the automation software and managed components is fully compatible with HTTP RESTful management APIs following the DMTF Redfish and SNIA Swordfish APIs. Our data nodes use industry standard NVMe SSDs; unlike other storage vendors, the Attala solution does not lock customers into a vendor’s choice of SSDs.
Attala provides customer flexibility in their choice of the network interface for their compute nodes. A full end-to-end FPGA-based solution uses Attala host NVMf PCIe adapters. However, leveraging Attala’s strict adherence to industry standards – RoCEv2, NVMe-over-fabrics - Attala also provides a solution that uses standard RDMA NICs (e.g. Mellanox) for the host interface. This is the ideal solution for customers focused on Linux or KVM (that has already implemented the NVMe-over-fabrics host software), that don’t require bare-metal composability or pre-boot provisioning, can tolerate the installation of additional host agent software and can tolerate some additional latency.
Reliability via Redundant Data and Data Paths
With pooled storage, maintaining reliable access to valuable data is paramount. The Attala solution supports two different configurations to address the concern as illustrated below.
With cross-node data protection - via the Attala host adapter or host software - data is dispersed across SSDs in different data nodes using either via mirroring/RAID1 or more sophisticated erasure coding, etc. In this configuration, entire data nodes can fail without loss of data; and data nodes can use more commonly available single-port NVMe SSDs. But with increasing SSD densities, the rebuild time of an entire data node become problematic - i.e. to transfer 100's of Terabytes of data across the network and into a spare node. To address this, Attala’s solution provides full link redundancy, redundant power-supplies and fans to isolate the single points of failure - within the node - to the PCIe fanout switch and the NVMe SSDs themselves.
With intra-node data protection, full redundancy is provided within a data node. This solution is used for customers with smaller data node clusters, that don’t disperse data across nodes, or for customers that do but want the full “belt and suspenders” approach to redundancy. In addition to redundant fans, power supplies and links, the Attala data node also provides redundant IO paths and dual-port NVMe SSDs to completely avoid any single-point of failure within the data node.
Security, Performance and Resource Automation
The Attala architecture rethinks cloud storage automation. Consistent with the company’s namesake (read more here), the solution doesn’t just blindly provide storage resources for applications. Once resources have been provisioned, the Attala solution continuously monitors and records every workload’s use of every allocated storage resource across the scale-out infrastructure. This enables a closed loop approach to automated infrastructure that improves performance assurance, increases resource utilization and efficiency, and eliminates or minimizes painful time spent diagnosing performance anomalies, outages or brown-outs.
- Users/tenants access the system via REST APIs or an Attala GUI. The APIs enable users to instantly and securely create, attach, re-attach or share persistent volumes for their workloads with support for seamless integration into common orchestration frameworks (Cinder, Kubernetes, etc). The preferences can be codified in templates or profiles with the users’ preference for volume size, QoS, data protection, security, etc. To keep users’ life simple in an increasingly dynamic, devops environment, the user does not need any knowledge of the underlying infrastructure.
- The Attala fabric include hardware-based, per-volume QoS-controls and per-volume performance monitoring (min latency, max latency, IOPS, throughput). Monitored information is collected from across the fabric using IoT telemetry methods and collected in a central, replicated database with support for real-time visualization and analytics. Unlike other stand-alone monitoring approaches, Attala monitoring is hardware-based, fully integrated and more importantly, has zero-impact to the resource being monitored – i.e. avoids the “Schrodinger’s cat” effect of disrupting the performance of the very resource that might already have performance anomalies.
- At the core of the architecture is what Attala calls the Security, Performance and Resource Automation (SPARA) engine. The engine is infrastructure-aware and maintains complete knowledge of the physical infrastructure, failure domains, network topology, SSD resources, etc. When a user/tenant requests a volume, the engine intelligently determines a location that meets the requested size/QoS/protection/security parameters while simultaneously optimizing resource utilization. The engine supports fully automated, painless discovery of resources added or removed from the fabric. Admin portals allow operators to create policies, forecast capacity and receive alerts.