Skip to content

Booksellers & Trade Customers: Sign up for online bulk buying at trade.atlanticbooks.com for wholesale discounts

Booksellers: Create Account on our B2B Portal for wholesale discounts

Multimodal AI Systems: Combining Vision, Text, and Audio for Rich Predictions

by Kalen Virell
Sold out
Current price ₹1,336.00
Original price ₹1,533.00
Original price ₹1,533.00
Original price ₹1,533.00
(-13%)
₹1,336.00
Current price ₹1,336.00

Imported Edition - Ships in 18-21 Days

Free Shipping in India on orders above Rs. 500

Request Bulk Quantity Quote
+91
Book cover type: Paperback
  • ISBN13: 9798298102919
  • Binding: Paperback
  • Subject: N/A
  • Publisher: Independently Published
  • Publisher Imprint: Independently Published
  • Publication Date:
  • Pages: 202
  • Original Price: GBP 12.51
  • Language: English
  • Edition: N/A
  • Item Weight: 277 grams
  • BISAC Subject(s): Artificial Intelligence / Computer Vision & Pattern Recognition

Multimodal AI is no longer a research toy. It is how modern systems see, read, and listen at once to make sharper predictions. If you work with computer vision, natural language, or audio-and especially if you need them to work together-this book shows you how to build real products that understand the world more like humans do.

Multimodal AI Systems gives you a practical path from fundamentals to deployment. You will learn how to represent images, text, and audio; fuse them with transformers and contrastive learning; and train models that can caption images, answer visual questions, parse speech, ground text in video, and more. You will also learn how to evaluate multimodal models, reduce hallucinations, and ship them with latency and cost in mind.

You will build end-to-end projects with clear code walk-throughs in Python using PyTorch, torchvision, torchaudio, OpenCV, and Hugging Face. You will fine-tune vision-language models, create cross-modal retrieval, add speech to vision pipelines, and instrument your system for quality, safety, and drift monitoring. Case studies from e-commerce, media, assistive tech, and robotics show what works in production and what to avoid.

If you want to move beyond single-modal silos and deliver smarter user experiences, this book is your roadmap. Buy it now and start building multimodal systems that see, read, and listen-then act.

Trusted for over 49 years

Family Owned Company

Secure Payment

All Major Credit Cards/Debit Cards/UPI & More Accepted

New & Authentic Products

India's Largest Distributor

Need Support?

Whatsapp Us