Skip to content

Booksellers & Trade Customers: Sign up for online bulk buying at trade.atlanticbooks.com for wholesale discounts

Booksellers: Create Account on our B2B Portal for wholesale discounts

Building Large Language Models with Python: A Developer's Guide to Sparse MoE, 1-Bit Quantization, Reasoning Systems and Multimodal AI

by Samuel Reynolds
Sold out
₹2,802.00
Original price ₹2,802.00
Original price ₹2,802.00
₹2,802.00
Current price ₹2,802.00

Imported Edition - Ships in 18-21 Days

Free Shipping in India on orders above Rs. 500

Request Bulk Quantity Quote
+91
Book cover type: Paperback
  • ISBN13: 9798259295339
  • Binding: Paperback
  • Subject: N/A
  • Publisher: Independently Published
  • Publisher Imprint: Independently Published
  • Publication Date:
  • Pages: 142
  • Original Price: USD 25.99
  • Language: English
  • Edition: N/A
  • Item Weight: 259 grams
  • BISAC Subject(s): Languages / Python

Most LLM books teach you how to call an API. This one teaches you how to build what's behind it.

As frontier AI shifts toward efficiency, sparsity, and on-device deployment, the engineers who understand the architecture not just the interface are the ones defining what comes next. Building Large Language Models with Python gives you that understanding, from the mathematics of attention to the deployment of a quantized, reasoning-capable model on local hardware.
Written from hard-won production experience, each chapter pairs rigorous theory with complete Python implementations not toy examples, but the kind of code that holds up under the demands of real training runs and live inference pipelines.

What you'll build:
- A Grouped-Query Attention module with KV cache support
- A Top-K sparse MoE layer with load-balancing auxiliary loss
- A BitLinear layer implementing ternary {-1, 0, 1} weights from scratch
- A Vision Transformer encoder with a multimodal projection layer
- A Process Reward Model for step-level reasoning verification
- A full DPO and GRPO training loop for alignment
- A local-first MCP server for agentic tool use
- A speculative decoding pipeline using a draft model

Topics covered include:
Rotary Positional Embeddings (RoPE) - FlashAttention-3 concepts - Quantization-Aware Training vs. Post-Training Quantization - Expert parallelism and All-to-All communication - FSDP vs. DDP distributed training - PagedAttention and KV cache optimization - On-device LoRA fine-tuning - Chain-of-thought reasoning architecture

This book is for you if:
You're a software engineer or ML practitioner comfortable with Python and PyTorch
You understand how a basic transformer works and want to go significantly deeper
You want to move beyond using models to building and owning them
You're building for edge deployment, private AI, or resource-constrained environments

The field is moving fast. This book is written for engineers who intend to move faster.

Trusted for over 49 years

Family Owned Company

Secure Payment

All Major Credit Cards/Debit Cards/UPI & More Accepted

New & Authentic Products

India's Largest Distributor

Need Support?

Whatsapp Us