BigDapTOOLS is a package of tools born with the objective of providing and unifying software developments related to data preprocessing and Big Data. This project began with funding from the BBVA Foundation. To date, we have several developments carried out on three well-known Data Science platforms, although the package will continue to grow in the coming years. The most noteworthy developments are:

  1.     Software in R. These algorithms address problems such as data reduction with autoencoders, data preprocessing for imbalanced data sets, ordinal and noisy data, as well as a general purpose library for data preprocessing called ‘smartdata’, which collects the state of the art algorithms for data preprocessing in R, being a container of algorithms that provides a uniform interface to other packages. (
  2.     Software in Spark. Apache Spark is an open source engine developed specifically to handle large-scale data processing and analysis. The developed software is available in Spark Packages and contains a set of data preprocessing algorithms for feature selection, discretization, noise filtering and imputation of missing values. (
  3.     Software on Flink. Apache Flink is a recent and novel Big Data framework which used the MapReduce paradigm, focused on distributed processing of data in flow and in batches. This library contains six of the most popular data preprocessing algorithms for data streams, three for discretization and the rest for feature selection. The associated work is in: (

Contact: Salvador García