Data Science: A Practical Approach in the Age of Big Data 7th Ed.

06/30/2021 to 07/16/2021

This course is aimed at employees and entrepreneurs of companies who understand that the application of these technologies is the way forward for their companies.

Introduction

Data Science is an interdisciplinary area of work that includes processes to collect, prepare, analyze, visualize and model data. It aims to generate useful knowledge to understand complex problems and assist in decision making. These data are often unstructured and heterogeneous. In many cases, these are large volumes of data. Due to their complexity and diversity, data usually require innovative architectures and techniques to extract relevant knowledge: the well-known big data. Data science is an emerging field with a high applicability in health sciences, marketing, business, financial markets, transport, communications, social networks, etc.

According to Gartner (the most prestigious consulting film in information technologies), data scientists are not traditional business analysts, but professionals with the rare capacity to obtain mathematical models from data that generate clear and convincing business benefits. As the trend shows, the data on the Internet is growing year by year.

Thus, professionals with skills in fields such as computing, mathematics, statistics or business are increasingly required to master new technologies and know how to manage data. Companies in all sectors are increasingly adopting data science. Therefore the demand for experts in this sector is enormous; this is reflected in a study by the MIT Sloan Management Review (2015), considered as one of the best job opportunities in the coming years (not in vain, Glassdoor defines it as the best profession in 2016). Classified by the Harvard Business Review as the ‘sexiest profession of the 21st century’ (2012). According to a study based on information from LinkedIn (2015), the number of data science professionals has doubled in the last four years. Another study by Burtch Works (2015) recognizes the positive impact on salary by including data science knowledge.

Objectives of the programme

Regulated university education plans hardly react to emerging job opportunities. In addition, there is a tendency to delimit frontiers that hinder the development of hybrid specialties. This course aims to introduce students to the field of data science, thus serving as a bridge between various disciplines and helping to complete university education with an eminently practical orientation. The course consists of 30 face to face hours divided into 15 hours of theoretical concepts and fundamentals and another 15 hours of practice with specialized software and data from real cases.

The theory of this course includes data visualization, both basic (decision trees, neural networks…) and advanced classification techniques (vector support machines, ensemble learning, deep learning…); data preprocessing (noise elimination, imputation of lost values, data reduction…); unsupervised learning (grouping and rules of association); incremental learning and data flow mining; big data and its paradigms; finally, real experiences of data science in the company. The practice introduces the student to software tools such as KNIME and R and big data architectures such as Spark. The Kaggle platform for competitions on real problems will also be introduced.

Attendants

Data scientists are known as data scientists, who are a mix of mathematicians, statisticians, computer scientists, and creatives with the skills to: a) collect, process, and extract value from diverse and extensive databases; b) the imagination to understand, visualize, and communicate their findings to non-scientists in data; and c) the ability to create data-driven solutions that increase benefits, reduce costs, and help build a better world..

The course is intended for undergraduate, master’s, and professional students with prior training primarily in computer science, mathematics, statistics, physics, engineering, or business. They should be seeking to complete their training as a data scientist. The presentation of theoretical foundations and the use of specialized software will be provided in an appropriate manner to meet the different needs of the student. Data science is a discipline that draws on diverse experiences and training, so that the course will take advantage of the variety of student needs and abilities.

Teaching Team

The teaching team is made up of senior and young university lecturers and researchers in the area of Computer Science and Artificial Intelligence at the University of Granada. They are highly specialized personnel in data science with excellent research trajectories. In the area of Engineering and Computer Science, the University of Granada is considered by the prestigious ranking ARWU 2017 Shanghai as the 33 best in the world, seventh in Europe and first in Spain.

Teaching team:

Jorge Casillas (coordinator)
Alberto Fernández, Diego J. García Gil, Salvador García, Julián Luengo y Daniel Molina

The course will also feature the participation of Francisco Maturana Cremades, Executive Director & CTO of Madiva S. L. (Madrid), which specialises in technological infrastructures, solving complex problems, processing and representing large amounts of data with collaborations for companies such as Iberia, Telefónica, Santander, BBVA and Banco Sabadell.

Theoretical contents (13h)

Data Science, advanced analytics and big data (1h) – Jorge Casillas
Exploratory Data Analysis: visualization (1h) – Jorge Casillas
Fundamentals of classification: decision trees, lazy, RNA, Bayesian, evaluation (2h) – Salvador García
Pre-processing: selection and processing of instances and characteristics, noise treatment (2h) – Salvador García
Advanced classification: SVM, ensemble learning, unbalanced problems, deep learning (2,5h) – Alberto Fernández
Segmentation and relationships: clustering and rules of association (2h) – Jorge CasillasAprendizaje incremental y data stream mining (1h) – Jorge Casillas
Incremental learning and data stream mining (1h) – Jorge Casillas
Big data: fundamentals and paradigms (1,5h) – Alberto Fernández

Practical contents (17h)

KNIME (5,5h): fundamental prediction – Julián Luengo
Python for data science(6,5h): visualization and advanced prediction – Daniel Molina
Spark y MLLib (5h): big data – Diego J. García Gil

Attendance: Required to attend 80% of session

Evaluation: Answers to theory and competition questions at Kaggle (https://www.kaggle.com/c/curso-ciencia-datos-ugr-6 )

Venue

ETS de Ingenierías Informática y de Telecomunicación – Universidad de Granada http://etsiit.ugr.es

https://goo.gl/maps/8N1zEYajC9k

The internship will take place in a room of computers with 8 GB RAM and Intel® Core^TM i5

More information: Centro Mediterráneo: http://www.ugr.es/~cm/accesos/18GR16.html

Webpage: https://sci2s.ugr.es/CienciaDatosBigData