Machine Learning - GitHub Projects

Practical Open Source Projects Open Source AI Machine Learning deep learning GPT Training Small LLM

Train a 26M GPT Model in 2 Hours for Just $0.40

October 15, 2025

Discover 'MiniMind,' an innovative open-source project that empowers anyone to train a compact 26M-parameter GPT model from scratch in just two hours, costing approximately $0.40. This project democratizes large language model (LLM) development by simplifying the entire process, including pre-training, fine-tuning, and advanced techniques like DPO and LoRA. Ideal for AI enthusiasts and developers looking to understand LLM internals without massive computational resources, MiniMind provides a comprehensive, hands-on learning experience. Learn how to set up your environment, prepare datasets, and deploy your own conversational AI model with minimal investment.

Read more Original

Practical Open Source Projects Open Source Python Machine Learning Vector Search Information Retrieval

Muvera-Py: Fast Multi-Vector Retrieval with FDE

July 11, 2025

Discover Muvera-Py, a new Python implementation of Google's MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) algorithm. This library revolutionizes search by transforming hundreds of document vectors into a single, fixed-size vector, significantly speeding up retrieval while maintaining accuracy. Learn how FDE, a highly optimized solution, addresses the scalability challenges of modern search systems like ColBERT. Muvera-Py offers full fidelity to the original C++ implementation, ensuring identical behavior for high-performance applications. Explore its features, including configuration classes, internal helper functions for Gray Code and random matrix generation, and the core algorithm for efficient FDE generation. Practical examples are provided to help developers integrate this powerful tool into their projects, making large-scale vector search faster and more memory-efficient.

Read more Original

Practical Open Source Projects AI Tools Open Source AI Machine Learning LLM Fine-tuning Large Language Models

LLaMA-Factory: Unified Fine-Tuning for 100+ LLMs & VLMs

June 27, 2025

Fine-tuning large language models can be a complex and resource-intensive task. LLaMA-Factory emerges as a game-changer, offering a unified and highly efficient platform for the fine-tuning of over 100 Large Language Models (LLMs) and Vision Language Models (VLMs). This open-source project, recognized at ACL 2024, simplifies complex AI development workflows with its zero-code command-line interface and intuitive Web UI. Trusted by industry giants like Amazon and NVIDIA, LLaMA-Factory empowers developers and researchers to enhance model performance across diverse tasks, from multi-turn dialogue to multimodal understanding, using advanced techniques like QLoRA and FlashAttention-2. Explore how this powerful tool can accelerate your AI projects.

Read more Original

Practical Open Source Projects Open Source AI Machine Learning LLM Fine-tuning GPU Optimization Large Language Models

Unsloth: Dramatically Speed Up LLM Fine-tuning & Save VRAM

June 27, 2025

Discover Unsloth, the open-source library revolutionizing Large Language Model (LLM) fine-tuning. Achieve up to 2x faster training and reduce GPU VRAM consumption by up to 80% compared to standard methods. Unsloth supports a wide range of models like Llama, Qwen, Gemma, and Mistral, along with Text-to-Speech and Vision models. Its user-friendly approach allows for free fine-tuning via beginner-friendly notebooks, enabling efficient training even on limited hardware. Dive into efficient LLM development with Unsloth's powerful features and robust performance.

Read more Original

Practical Open Source Projects AI Tools Open Source Python Machine Learning Libraries

Best of ML Python: Top Open-Source Libraries Revealed

June 25, 2025

Dive into 'Best-of-ML-Python,' a meticulously ranked collection of over 900 awesome open-source machine learning Python libraries. Updated weekly, this list is an invaluable resource for developers, researchers, and data scientists looking for high-quality tools across various ML domains, including frameworks, data visualization, NLP, image processing, and more. Discover top-tier projects like TensorFlow, PyTorch, scikit-learn, and Hugging Face's Transformers, each evaluated by a unique project-quality score. Whether you're building, learning, or optimizing, this curated resource helps you pinpoint the most impactful libraries for your machine learning endeavors. Contributions are also welcome to keep the list current and comprehensive.

Read more Original

Practical Open Source Projects Open Source AI prompt engineering AI Development Machine Learning LLM Optimization

Master Prompt Engineering: The Ultimate Open-Source Guide

June 10, 2025

Dive into the definitive open-source Prompt Engineering Guide by DAIR.AI, offering a wealth of resources from introductory concepts to advanced techniques for optimizing large language models (LLMs). This guide provides papers, lectures, notebooks, and practical examples for anyone from researchers to developers looking to deeply understand and effectively utilize LLMs. Discover methods like Chain-of-Thought, RAG, and more to enhance your AI applications. Join millions of learners and elevate your LLM proficiency with this continuously updated, community-driven resource.

Read more Original

Practical Open Source Projects Open Source AI LLM Merging Model Fusion AI Toolkit Machine Learning

MergeKit: Combine LLMs with Ease and Efficiency

June 10, 2025

Discover MergeKit, an open-source toolkit designed for merging pre-trained large language models (LLMs). This powerful tool allows users to combine the strengths of different models without extensive training or high computational overhead. With support for various merge methods, CPU/GPU execution, and low memory usage, MergeKit is ideal for creating versatile, custom LLMs. Learn how to install, configure, and utilize this versatile toolkit to enhance your AI projects, including multi-stage merging and LoRA extraction. Whether you're a researcher or developer, MergeKit simplifies the complex process of model integration, making advanced LLM capabilities more accessible.

Read more Original

Categories

Posts tagged with: Machine Learning

Train a 26M GPT Model in 2 Hours for Just $0.40

Muvera-Py: Fast Multi-Vector Retrieval with FDE

LLaMA-Factory: Unified Fine-Tuning for 100+ LLMs & VLMs

Unsloth: Dramatically Speed Up LLM Fine-tuning & Save VRAM

Best of ML Python: Top Open-Source Libraries Revealed

Master Prompt Engineering: The Ultimate Open-Source Guide

MergeKit: Combine LLMs with Ease and Efficiency