Key attributes of Big Data can be described by the three (or several) "V's": Big Volume, Big Velocity, and Big Variety. In this group, we mainly focus on the last "V", the Big variety of data:
To use and combine data from E-commerce, sensors, and social media services, integration and curation routines have to be employed. The heterogeneity of data impedes the seamless integration of different sources, requiring human intervention in form of exhaustive profiling and data preparation efforts. Hence, research on Big Data calls for scalable data profiling and integration systems that enable curation and consumption of large and many and diverse data sources.
Along with profiling and integration of large datasets, the deployment of sophisticated analytics on data (big analytics) is strongly related to the above mentioned problem. We are interested in systems that leverage mining and machine learning techniques to derive knowledge from dirty and poorly organized data. This includes developing sketching and summarizing techniques that reduce a big dataset to its relevant core information.