{"product_id":"building-large-language-models-with-python-a-developers-guide-to-sparse-moe-1-bit-quantization-reasoning-systems-and-multimodal-ai-9798259295339","title":"Building Large Language Models with Python: A Developer's Guide to Sparse MoE, 1-Bit Quantization, Reasoning Systems and Multimodal AI","description":"\u003cp\u003e • Author(s): Samuel Reynolds\u003cbr\u003e • Publisher: Independently Published\u003cbr\u003e • Publisher Imprint: Independently Published\u003cbr\u003e • BISAC: Languages - Python\u003c\/p\u003e\u003cp\u003e\u003cb\u003eMost LLM books teach you how to call an API. This one teaches you how to build what's behind it.\u003c\/b\u003e \u003c\/p\u003e\u003cp\u003e\u003c\/p\u003eAs frontier AI shifts toward efficiency, sparsity, and on-device deployment, the engineers who understand the architecture not just the interface are the ones defining what comes next. Building Large Language Models with Python gives you that understanding, from the mathematics of attention to the deployment of a quantized, reasoning-capable model on local hardware.\u003cbr\u003eWritten from hard-won production experience, each chapter pairs rigorous theory with complete Python implementations not toy examples, but the kind of code that holds up under the demands of real training runs and live inference pipelines. \u003cp\u003e\u003c\/p\u003e\u003cb\u003eWhat you'll build: \u003c\/b\u003e\u003cbr\u003e- A Grouped-Query Attention module with KV cache support\u003cbr\u003e- A Top-K sparse MoE layer with load-balancing auxiliary loss\u003cbr\u003e- A BitLinear layer implementing ternary {-1, 0, 1} weights from scratch\u003cbr\u003e- A Vision Transformer encoder with a multimodal projection layer\u003cbr\u003e- A Process Reward Model for step-level reasoning verification\u003cbr\u003e- A full DPO and GRPO training loop for alignment\u003cbr\u003e- A local-first MCP server for agentic tool use\u003cbr\u003e- A speculative decoding pipeline using a draft model \u003cp\u003e\u003c\/p\u003e\u003cb\u003eTopics covered include: \u003c\/b\u003e\u003cbr\u003eRotary Positional Embeddings (RoPE) - FlashAttention-3 concepts - Quantization-Aware Training vs. Post-Training Quantization - Expert parallelism and All-to-All communication - FSDP vs. DDP distributed training - PagedAttention and KV cache optimization - On-device LoRA fine-tuning - Chain-of-thought reasoning architecture \u003cp\u003e\u003c\/p\u003e\u003cb\u003eThis book is for you if: \u003c\/b\u003e\u003cbr\u003eYou're a software engineer or ML practitioner comfortable with Python and PyTorch\u003cbr\u003eYou understand how a basic transformer works and want to go significantly deeper\u003cbr\u003eYou want to move beyond using models to building and owning them\u003cbr\u003eYou're building for edge deployment, private AI, or resource-constrained environments \u003cp\u003e\u003c\/p\u003e\u003cb\u003eThe field is moving fast. This book is written for engineers who intend to move faster.\u003c\/b\u003e","brand":"Independently Published","offers":[{"title":"Paperback","offer_id":47883176411287,"sku":"9798259295339","price":2802.0,"currency_code":"INR","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0666\/3471\/1191\/files\/9798259295339.webp?v=1781099988","url":"https:\/\/atlanticbooks.com\/products\/building-large-language-models-with-python-a-developers-guide-to-sparse-moe-1-bit-quantization-reasoning-systems-and-multimodal-ai-9798259295339","provider":"Atlantic Books","version":"1.0","type":"link"}