{"product_id":"gpu-cloud-computing-accelerating-ai-and-ml-leverage-gpus-in-the-cloud-for-high-performance-computing-9798267514781","title":"GPU Cloud Computing Accelerating AI and ML: Leverage GPUs in the cloud for high-performance computing","description":"\u003cp\u003e • Author(s): Corwin Halesworth\u003cbr\u003e • Publisher: Independently Published\u003cbr\u003e • Publisher Imprint: Independently Published\u003cbr\u003e • BISAC: Distributed Systems - Cloud Computing\u003c\/p\u003e\u003cp\u003e\u003c\/p\u003e\u003cp\u003eMake accelerators work for your models-and your budget. \u003cb\u003eGPU Cloud Computing: Accelerating AI and ML\u003c\/b\u003e is a practical field guide to training and serving on managed GPU platforms. You'll learn how to choose the right instances, size storage and networking, orchestrate distributed jobs, and keep throughput high with stable, reproducible workflows.\u003c\/p\u003e\u003cp\u003eWe start with the essentials (device types, quotas, images, drivers) and move to production patterns: NCCL-based multi-node training, mixed-precision and tensor cores, checkpointing strategies, autoscaling inference, and cost controls. Clear examples, checklists, and review rubrics help you avoid painful pitfalls like PCIe bottlenecks, topology mismatches, hot shards, and noisy neighbors-so your clusters stay fast and predictable.\u003c\/p\u003e\u003cp\u003e\u003cb\u003eWhat you'll learn\u003c\/b\u003e\u003c\/p\u003e\u003cul\u003e\n\u003cli\u003e\u003cp\u003ePick instances and accelerators for throughput vs. price; plan quotas and queues\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eOptimize input pipelines: parallel reads, caching, and data locality\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eRun distributed training with DDP\/Horovod; tune NCCL and all-reduce strategies\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eUse mixed precision, gradient accumulation, and sharded optimizers safely\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eLeverage topology features (NVLink, MIG) and fast fabrics (EFA\/InfiniBand)\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003ePackage environments with CUDA\/ROCm images; manage drivers and runtimes\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eDeploy inference with TensorRT, batch\/streaming, and latency budgets\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eOrchestrate jobs on Kubernetes (GPU Operator), Slurm, or Ray clusters\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eSecure artifacts and endpoints; sign containers and enforce least privilege\u003c\/p\u003e\u003c\/li\u003e\n\u003cli\u003e\u003cp\u003eControl spend with spot\/preemptible pools, right-sizing, and utilization dashboards\u003c\/p\u003e\u003c\/li\u003e\n\u003c\/ul\u003e\u003cp\u003e\u003cb\u003eWho it's for\u003c\/b\u003e\u003cbr\u003eML engineers, platform teams, researchers, and architects who want reliable, cost-aware GPU workflows across major providers.\u003c\/p\u003e\u003cp\u003e\u003cb\u003eWhat's inside\u003c\/b\u003e\u003cbr\u003eReference architectures, IaC snippets, container recipes, tuning checklists, failure playbooks, and cost\/performance review rubrics.\u003c\/p\u003e","brand":"Independently Published","offers":[{"title":"Paperback","offer_id":47594509566103,"sku":"9798267514781","price":1776.0,"currency_code":"INR","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0666\/3471\/1191\/files\/9798267514781.webp?v=1774986214","url":"https:\/\/atlanticbooks.com\/products\/gpu-cloud-computing-accelerating-ai-and-ml-leverage-gpus-in-the-cloud-for-high-performance-computing-9798267514781","provider":"Atlantic Books","version":"1.0","type":"link"}