Data Pre-Processing and Data Quality

In recent years, there has been an immense growth in data, leading to the Big Data. This requires large computing infrastructure with high performance processing capabilities. Getting large data ready for analysis and knowledge extraction is a difficult task and requires data to be pre-processed to improve the quality of the raw data. Data representation and quality is one of the most important facets in the data science process. 

Data preprocessing is a preliminary practice in data science in which the raw data are transformed into a format suitable for analysis and the modeling algorithms. It improves data quality by cleaning, normalizing, transforming, reducing, and extracting relevant characteristics from the raw data. Data pre-processing significantly improves the performance of the machine learning algorithms, which in turn results in accurate model extraction. Discovering knowledge from noisy, irrelevant, and redundant data is a difficult task, so accurately identifying outliers, supplanting missing values, and reducing the volume of useful data poses challenging problems in data science. 

The challenges in data pre-processing are focused on automation and accurate decision-making in their chained use; adjustment to address complex data structure and adaptation of techniques to increase reliability, fairness and transparency of models subsequently obtained by data science algorithms and data pre-processing for multi-source biomedical data pipelines and imaging methods.

Contact: Salvador García López

Related Researchers:

Letra:

  Name Email Area Cat.
Benítez Sánchez, José Manuel J.M.Benitez@decsai.uMdUL68fa1vxngr.es Data Science and Big Data Area, Computational Intelligence Area PhD
Cano de Amo, José Ramón jrcano@4yVwJ524Htonujaen.es Data Science and Big Data Area PhD
natalia-diaz
Díaz Rodríguez, Natalia ndiaz@decsai.ugr7ZFROgroDZQ.es Computational Intelligence Area, Data Science and Big Data Area PhD
García Gil, Diego Jesús djgarcia@duwl0Wcecsai.ugr.es Data Science and Big Data Area, Computational Intelligence Area PhD
García López, Salvador salvaglBHajKRI3uSoJ@decsai.ugr.es Data Science and Big Data Area PhD
Górriz Sáez, Juan Manuel gorriz@uiOMrT8gr.es DaSCI Technology Applications Area PhD
Herrera Triguero, Francisco herrera@decsai.ugB4xv_u_66r.es DaSCI Technology Applications Area, Data Science and Big Data Area, Computational Intelligence Area PhD
Lucena Sánchez, Estrella estrellalucena@unPs8GB0txgr.es Data Science and Big Data Area PhD - Others
Luengo Martín, Julián julianlm@decsaimIzU@r.ugr.es Data Science and Big Data Area PhD
Ortíz García, Andrés aortizjO9ThIls@ic.uma.es DaSCI Technology Applications Area PhD
Romero Zaliz, Rocío rocioQIM6_wd@ugr.es DaSCI Technology Applications Area PhD
Triguero Velázquez, Isaac triguero@decsai.ugrvmfQv0iDldR.es Data Science and Big Data Area
Val Muñoz, Coral del delval@decsaigkq.W4I.ugr.es DaSCI Technology Applications Area PhD