{"product_id":"mastering-large-datasets-with-python-parallelize-and-distribute-your-python-code-9781617296239","title":"Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code","description":"\u003cp\u003e • Author(s): John T. Wolohan\u003cbr\u003e • Publisher: Manning Publications\u003cbr\u003e • Publisher Imprint: Manning Publications\u003cbr\u003e • BISAC: Languages - Python\u003c\/p\u003e\u003cp\u003eSummary \u003cbr\u003eModern data science solutions need to be clean, easy to read, and scalable. In \u003ci\u003eMastering Large Datasets with Python\u003c\/i\u003e, author J.T. Wolohan teaches you how to take a small project and scale it up using a functionally influenced approach to Python coding. You'll explore methods and built-in Python tools that lend themselves to clarity and scalability, like the high-performing parallelism method, as well as distributed technologies that allow for high data throughput. The abundant hands-on exercises in this practical tutorial will lock in these essential skills for any large-scale data science project. \u003c\/p\u003e\u003cp\u003e\u003c\/p\u003e \u003cp\u003e\u003c\/p\u003ePurchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. \u003cp\u003e\u003c\/p\u003e About the technology \u003cbr\u003eProgramming techniques that work well on laptop-sized data can slow to a crawl--or fail altogether--when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change. \u003cp\u003e\u003c\/p\u003e About the book \u003cbr\u003e\u003ci\u003eMastering Large Datasets with Python\u003c\/i\u003e teaches you to write code that can handle datasets of any size. You'll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You'll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firmly in place, you'll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3. \u003cp\u003e\u003c\/p\u003e What's inside \u003cbr\u003e \u003cul\u003e \u003cli\u003eAn introduction to the map and reduce paradigm\u003c\/li\u003e \u003cli\u003eParallelization with the multiprocessing module and pathos framework\u003c\/li\u003e \u003cli\u003eHadoop and Spark for distributed computing\u003c\/li\u003e \u003cli\u003eRunning AWS jobs to process large datasets\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003e\u003c\/p\u003e About the reader \u003cbr\u003eFor Python programmers who need to work faster with more data. \u003cp\u003e\u003c\/p\u003e About the author \u003cbr\u003e\u003cb\u003eJ. T. Wolohan\u003c\/b\u003e is a lead data scientist at Booz Allen Hamilton, and a PhD researcher at Indiana University, Bloomington. \u003cp\u003e\u003c\/p\u003e \u003cp\u003e\u003c\/p\u003eTable of Contents: \u003cp\u003e\u003c\/p\u003ePART 1 \u003cp\u003e\u003c\/p\u003e1 ] Introduction \u003cp\u003e\u003c\/p\u003e2 ] Accelerating large dataset work: Map and parallel computing \u003cp\u003e\u003c\/p\u003e3 ] Function pipelines for mapping complex transformations \u003cp\u003e\u003c\/p\u003e4 ] Processing large datasets with lazy workflows \u003cp\u003e\u003c\/p\u003e5 ] Accumulation operations with reduce \u003cp\u003e\u003c\/p\u003e6 ] Speeding up map and reduce with advanced parallelization \u003cp\u003e\u003c\/p\u003ePART 2 \u003cp\u003e\u003c\/p\u003e7 ] Processing truly big datasets with Hadoop and Spark \u003cp\u003e\u003c\/p\u003e8 ] Best practices for large data with Apache Streaming and mrjob \u003cp\u003e\u003c\/p\u003e9 ] PageRank with map and reduce in PySpark \u003cp\u003e\u003c\/p\u003e10 ] Faster decision-making with machine learning and PySpark \u003cp\u003e\u003c\/p\u003ePART 3 \u003cp\u003e\u003c\/p\u003e11 ] Large datasets in the cloud with Amazon Web Services and S3 \u003cp\u003e\u003c\/p\u003e12 ] MapReduce in the cloud with Amazon's Elastic MapReduce","brand":"Manning Publications","offers":[{"title":"Paperback","offer_id":45356293849239,"sku":"9781617296239","price":4162.0,"currency_code":"INR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0666\/3471\/1191\/files\/9781617296239.webp?v=1769292080","url":"https:\/\/atlanticbooks.com\/products\/mastering-large-datasets-with-python-parallelize-and-distribute-your-python-code-9781617296239","provider":"Atlantic Books","version":"1.0","type":"link"}