Ollama has added support for Apple's open source MLX framework, enabling faster execution of large language models on Macs equipped with Apple Silicon chips like M1 and later. This update, part of Ollama 0.19 currently in preview, also includes enhanced caching and support for Nvidia's NVFP4 format, improving memory efficiency for certain models. The performance boost arrives as interest in running AI models locally surges beyond niche developer and research circles. The rise of OpenClaw, which gained over 300,000 stars on GitHub and sparked widespread experimentation in China, has accelerated demand for on-device models. Developers increasingly seek alternatives to cloud-based coding tools like Claude Code and ChatGPT Codex, frustrated by rate limits and high subscription costs. Ollama's improved integration with Visual Studio Code further streamlines the local AI development workflow. However, the new MLX support is limited to a single model for now—the 35-billion-parameter version of Alibaba's Qwen3.5—and requires demanding hardware, including at least 32GB of RAM and an Apple Silicon Mac, placing it out of reach for many casual users.

💡 NaijaBuzz Take

When Ollama says the new MLX support boosts performance on Apple Silicon Macs, that means developers can now run heavier AI models like Qwen3.5 locally without relying on costly cloud APIs—something that was impractical just a year ago. This shift could empower Nigerian developers at startups like Andela or freelance builders who need tight control over latency, cost, and data privacy. While the 32GB RAM requirement excludes most consumer machines, it signals a growing trend: high-end local AI is no longer a hobbyist fantasy but a viable development path. For African tech teams building AI tools with limited API budgets, this is a quiet game-changer.