Gnoppix AI local cluster (MLX distributed AI)

In development now

What is it?

If you have heard of Kubernetes, this is for you. MLX is like a pre-installed Kubernetes cluster designed for local AI. It automatically detects all your devices on a local network—whether you are running a Mac, a Linux laptop, or a Raspberry Pi. All resources are combined into a single, powerful local endpoint.

Not only does exo enable running models larger than would fit on a single device, but with day-0 support for RDMA over Thunderbolt, makes models run even faster as you add more devices.

Automatic Device Discovery: Devices automatically discover each other - no manual configuration.
RDMA over Thunderbolt: With day-0 support for RDMA over Thunderbolt 5, enabling 99% reduction in latency between devices.
Topology-Aware Auto Parallel: the cluster figures out the best way to split your model across all available devices based on a realtime view of your device topology. It takes into account device resources and network latency/bandwidth between each link.
Tensor Parallelism: Supports sharding models, for up to 1.8x speedup on 2 devices and 3.2x speedup on 4 devices.
MLX Support: The magic behind is MLX as an inference backend and MLX distributed for distributed communication.

What is MLX?

Originally MLX is an array framework for machine learning on Apple silicon, brought to you by Apple machine learning research.

Some key features of MLX include:

Familiar APIs: MLX has a Python API that closely follows NumPy. MLX also has fully featured C++, C, and Swift APIs, which closely mirror the Python API. MLX has higher-level packages like mlx.nn and mlx.optimizers with APIs that closely follow PyTorch to simplify building more complex models.

Composable function transformations: MLX supports composable function transformations for automatic differentiation, automatic vectorization, and computation graph optimization.

Lazy computation: Computations in MLX are lazy. Arrays are only materialized when needed.

Dynamic graph construction: Computation graphs in MLX are constructed dynamically. Changing the shapes of function arguments does not trigger slow compilations, and debugging is simple and intuitive.

Multi-device: Operations can run on any of the supported devices (currently the CPU and the GPU).

Unified memory: A notable difference from MLX and other frameworks is the unified memory model. Arrays in MLX live in shared memory. Operations on MLX arrays can be performed on any of the supported device types without transferring data.

Examples

See https://github.com/ml-explore/mlx-examples