Fully Asynchronous Architecture: Asynchronous messaging, methods, and HTTP calls.
Asynchronous messaging: Services communicate via a message bus. When calling a service, the source service sends a message to the destination service, registers a callback function, and returns immediately. Upon task completion, the destination service triggers the callback with results. Asynchronous messages can be processed in parallel.
Asynchronous methods: Services communicate through asynchronous messages. Components or plugins inside each service use asynchronous method calls, following the same pattern as asynchronous messaging.
Asynchronous HTTP calls: Each plugin has an agent that puts a callback URL in the HTTP header of every request. After task completion, the agent sends the response to the caller's URL.
Based on these three asynchronous approaches, ZStack ZSphere builds a layered architecture to ensure all components operate asynchronously.
With this fully asynchronous architecture, a single management node can process tens of thousands of concurrent API requests per second, while managing tens of thousands of hosts and hundreds of thousands of virtual machines.
Stateless Service: Each request is independent.
Compute node agents, storage agents, network services, console proxy services, and configuration services require no inter-request dependencies. Every request carries complete context and no node needs to maintain or store any information.
Resources such as management nodes and compute nodes are authenticated via UUID-based consistent hashing ring. Message senders do not need to know specific service instances, and services simply process messages without maintaining or exchanging resource information.
Little information is shared among management nodes. Therefore, a minimum of two management nodes can meet the requirements of high availability and scalability.
The stateless service mechanism makes the system more robust. Restarting servers will not lose any state information. This also simplifies the scaling out and scaling in of a data center.
The consistent hashing algorithm ensures all messages of the same resource are always processed by the same service instance. This message aggregation to specific nodes reduces synchronization and parallel processing complexity.
Work queues replace lock contention. Serial tasks are stored in memory as work queues, which can process any operation on any resource in parallel to improve system parallelism.
The queue-based lock-free architecture enables tasks to run in parallel, thereby improving the system performance.
A message bus isolates and controls services, such as virtual machine services, identity authentication services, snapshot services, volume services, network services, and storage services. All microservices are enclosed in the management node process. These services communicate with each other through the message bus. All messages are first sent to the message bus, then forwarded to the destination service using consistent hashing ring.
In-process microservices adopt a star-like architecture where each service runs independently. This architecture decouples the highly centralized control business, achieving high autonomy and isolation. Service failures will not affect other components, ensuring system reliability and stability.
Every plugin provides services independently. Any newly added plugin has no impact on other existing plugins.
Plugins support both Strategy Pattern and Observer Pattern designs. Strategy plugins inherit parent-class interfaces to implement specific functions. Observer plugins register Listeners to monitor event changes of the internal business logic in an application. When events occur, the observer plugins will automatically respond and trigger corresponding business flow.
This horizontal plugin scalability enables ZStack ZSphere to rapidly upgrade while maintaining robust system architecture.
Workflow Engine: Sequence-based management and error rollback.
Every workflow is clearly defined in XML. If error occur at any step, the workflow rolls back along the original executed path and cleans up the garbage resources.
Each workflow can contain sub-workflows for extended business logic implementation.
Tag System: Enables business logic change and resource attributes.
Extend or modify business logic dynamically using system tags and plugin mechanisms.
Group and categorize resources with tags, enabling resource searches by specific tags.
The Cascade framework manages resources through waterfall-like chained operations. For example, uninstalling or deleting a resource triggers corresponding cascading operations on related resources.
Resources can be added to the Cascade framework as plugins. Their addition or removal does not affect other resources.
The cascading mechanism enables flexible and lightweight resource configuration, allowing rapid adaptation to customer configuration changes.
Automated Deployment: Agentless auto-deployment with Ansible.
Leverages agentless Ansible to automate dependency installation, physical resource configuration, and agent deployment. The whole process is transparent to users and requires no additional intervention. You can upgrade your agents simply by reconnecting the agents.
Comprehensive Query APIs: Access any attribute of any resource.
Supports resource queries with millions of conditions, comprehensive query APIs, and any way of condition combinations.
Functional Architecture
Figure 1 shows the functional architecture of ZStack ZSphere.Figure 1. Functional Architecture
ZStack ZSphere provides enterprise-grade data center infrastructure services for compute, storage, and network resource management. ZStack ZSphere supports KVM virtualization technology at its foundation. ZStack ZSphere supports DAS, NAS, SAN, and DSS storage types, including local storage, NFS storage, SAN storage, and distributed storage. ZStack ZSphere supports network models like distributed switch and distributed port group.
With ZStack ZSphere as its core engine, the platform uses a message bus to communicate with MariaDB database and service modules. Key functionalities include virtual machine management, host management, storage scheduling, network services, system administration, monitoring and auditing, and more.
ZStack ZSphere provides Java and Python SDKs and RESTful APIs for resource scheduling and management.
Resource Model
Figure 1 shows the resource structure of ZStack ZSphere.Figure 1. Resource Model
ZStack ZSphere has the following main resources:
Data Center: A data center is the largest resource namespace within a virtualization platform, including resources such as clusters, hosts, data storage, distributed switches, and distributed port groups.
Cluster: A logical collection of a group of hosts (compute nodes).
Host: A host is an x86 or ARM physical server running a KVM virtualization hypervisor, providing resources such as computing, networking, and storage to virtual machines.
Virtual Machine: A virtual machine is a virtualized host running on a physical host, capable of running an operation system and applications just like a physical host.
Data Storage: A data storage is a virtualized resource that provides storage space for virtual machines and their application data. A data storage can be categorized into local storage and network shared storage.
Distributed Switch: A virtual switching device that provides unified virtual network management and monitoring for virtual machines within a cluster.
Distributed Port Group: A logical grouping of ports on a distributed switch, used for port configuration.
Image Storage: An image storage is a virtualized resource that provides storage space for image template files used by virtual machines or disks. An image storage can be categorized into standalone image storage and distributed image storage.
Image: An image is a template file used by virtual machines or disks. Images are categorized into system images and disk images.
ZStack ZSphere resources maintain the following two types of relationships:
Hierarchical relationships: Similar to interpersonal relationships in human society, including parent-child, sibling, grandparent-grandchild, and peer relationships.
Definitions:
Parent-Child: Resource A is the parent of child of Resource B. For example, clusters and hosts, or hosts and virtual machines. In both cases, the latter resource runs within the former.
Sibling: Resource A and Resource B share the same parent. For example, clusters and distributed switches, or clusters and data storage. Both pairs share the data center as their parent.
Grandparent-Grandchild: Resource A is the direct grandparent or grandchild of Resource B. For example, a data center is the parent of a cluster, a cluster is the parent of a host, and a host is the parent of a virtual machine. Therefore, the data center is the grandparent of the host, and the cluster is the grandparent of the virtual machine.
Peer: Resource A and Resource B do not have any of the above relationships, but they need to collaborate in certain scenarios. For example, data storage and image storage works together to provide services for clusters.
Cardinality relationships: Similar to quantitative constraints in human society, including 1:n (one-to-many), n:1 (many-to-one), and n:n (many-to-many).
Definitions:
1:n: Indicates that Resource A can create, add, or attach to multiple Resource B. For example, one cluster can contain multiple hosts, and one distributed switch can attach to multiple clusters.
n:1: Indicates that multiple Resource A can be created, added, or attached to Resource B. For example, multiple hosts can be added to the same cluster, and multiple clusters can be attached to the same distributed switch.
n:n: Indicates that Resource A can create, add, or attach to multiple Resource B, while Resource B can also be created, added, or attached to multiple Resource A. For example, one image storage can be attached to multiple data centers, and one data center can attach multiple image storage.