Large-Scale Data Analytics with Python and Spark

19 December, 2023

A Hands-on Guide to Implementing Machine Learning Solutions

Isaac Triguero, University of Granada
Mikel Galar, Public University of Navarre

Editorial: Cambridge University Press
Year: 2023
Pages: 378
ISBN: 9781009318242

A hands-on textbook teaching how to carry out large-scale data analytics and implement machine learning solutions for Big Data. Including copious real-world examples, it offers a coherent teaching package with lab assignments, exercises, solutions for instructors, and lecture slides.

The book teaches the key concepts to do Large-scale Data Analytics and Machine Learning with Big Data. It is divided into three main parts. Part I covers the basic concepts to understand what is Big Data, and the key principles and programming paradigms to deal with it. Part II dives into the technological part of Big Data, introducing some of the most consolidated Big Data Frameworks, namely Hadoop and Spark. This involves key technical details and how to efficiently program with distributed data structures. Finally, Part III focuses on how to do Machine Learning and Data Science in the presence of large volumes of data, learning how to use existing libraries and how to design efficient and effective solutions to adapt data science techniques (including preprocessing, learning and model deployment) to this scenario. The book contains many examples, and each chapter includes various challenges for the readers and a series of exercises. The supplementary material includes Lab Assignments that comprise larger coding projects with various levels. The supplementary material also provides a practical tutorial to get started with Python and Spark.