Session Co-Chairs: Demetris Zeinalipour (University of Cyprus, Cyprus), Panos K. Chrysanthis (University of Cyprus, Cyprus and University of Pittsburgh, USA) and Ke Yi (HKUST, Hong Kong, China)
(#27)Answer Counting Under Guarded TGDs, Cristina Feier (University of Bremen, Germany), Carsten Lutz (University of Bremen, Germany), and Marcin Przybylko (University of Bremen, Germany) (paper)(13 min)
(#55)Locality-aware Distribution Schemes, Bruhathi Sundarmurthy (University of Wisconsin-Madison, United States), Paraschos Koutris (University of Wisconsin-Madison, United States), and Jeffrey Naughton (University of Wisconsin-Madison, United States) (paper)(13 min)
Abstract: Suppose there is a database we have no direct access to, but there are views of this database available to us, defined by some queries Q_1, Q_2 ,. . . Q_k. And we are given another query Q. Will we be able to compute Q only using the available views? The above question, calling it “the question of determinacy”, sounds almost philosophical. One can easily imagine a bearded man in himation chained to the wall of a cave watching the views projected on the wall and pondering whether, from what he is able to see, the reality can be faithfully reconstructed. For us it is a database theory question though. And a really well motivated one, with motivations ranging from query evaluation plans optimization (where we prefer a positive answer) to privacy issues (where the preferred answer is negative). Query determinacy is a broad topic, with literally hundreds of papers published since the late 1980s. This talk is not going to be a “survey” (which would be impossible, within a one hour time frame, and with this speaker), but rather a personal perspective of a person somehow involved in recent developments in the area. First I will explain how, in the last 30+ years, the question of determinacy was formalized. There are many parameters here: obviously one needs to choose the query language of the queries Q_i and the query language of Q. But – surprisingly – there is also some choice as to what the word “to compute” actually means in this context. Then I will concentrate on the variants of the decision problem of determinacy (for each choice of parameters there is one such problem–Q_1 , Q_2 , . . . Q_k and Q constitute the instance, and the question is whether Q_1 , Q_2 , . . . Q_k determine Q) and I will talk about how I understand the mechanisms rendering different variants of determinacy decidable or undecidable. This will be on a slightly informal level. No new theorems will be presented, but I think I will be able to show simplified proofs of some of the earlier results.
Bio: Jerzy Marcinkowski received his PhD degree, in 1993 , from the Mathematical Department of the University of Wrocław and spent his entire academic career (with the exception of a three semesters long post-doc at the Laboratoire d’I Informatique Fondamentale de Lille in France) working for the University of Wrocław, where he is currently serves as the head of the Department of Computer Science. His modus operandi is to try to attack well-abstracted, long standing open problems in logic in computer science, including database theory. Database theory people may know him for his work on DATALOG single rule program boundedness (1996), database repairs and consistent query answering, including prioritized repair (with Jan Chomicki and Sławek Staworko, 2000s) and, more recently, for his work on the Chase algorithm, where he proved (with his student Tomasz Gogacz) that all-instances chase termination is undecidable (2014). His main topic of interest in the last five years has been query determinacy.
Session Co-Chairs: Sihem Amer-Yahia (CNRS, Univ. Grenoble Alpes, France) and Panos K. Chrysanthis (University of Cyprus, Cyprus and University of Pittsburgh, USA)
Reception I: Diversity and Inclusion (D&I) Debrief, Sihem Amer-Yahia (CNRS, Univ. Grenoble Alpes, France), Panos K. Chrysanthis (University of Pittsburgh, USA), Avrilia Floratou (Microsoft, USA), Fatma Ozcan (Google, USA), and Victor Zakhary (Oracle, USA)
Abstract: This debrief will introduce the D&I initiative whose mission if to establish best practices for
Diversity and Inclusion in Database Conference Venues https://dbdni.github.io/
Bio: Sihem Amer-Yahia is a CNRS Research Director. Her interests are at the intersection of large-scale data management and social data exploration. Sihem held positions at QCRI, Yahoo! Research and at&t Labs. She served on the SIGMOD Executive Board and the VLDB Endowment. She is currently chairing ICDE 2020 and is the VLDB 2021 Diversity and Inclusion chair.
Panos K. Chrysanthis is a Professor of Computer Science and a founder and director of the Advanced Data Management Technologies Laboratory at the University of Pittsburgh. He is also an adjunct Professor at Carnegie-Mellon University and at the University of Cyprus. His research interests within the areas of data management include database systems, data streams systems, and interactive data exploration and visualization. His editorial service includes VLDB J, IEEE TKDE, and DAPD. He is an ACM Distinguished Scientist and a Senior Member of IEEE. In 2015, he received the University of Pittsburgh's Provost Award for Excellence in Mentoring. He is currently co-chairing the EDBT/ICDT 2021 and serves as its Diversity and Inclusion Chair.
Avrilia Floratou is a Principal Scientist at Microsoft’s Gray Systems Lab (GSL). Her research broadly lies in the area of data management with a recent focus on frameworks that simplify data science workflows. Her research interests also include large-scale stream processing, relational databases, benchmarking and performance tuning of Big Data platforms. She is currently working on simplifying various aspects of data science workflows such as data exploration, data preparation, and feature engineering among others. Previously, she worked on the Dhalion project, a framework used to define and apply SLO-based policies on long-running applications that has been deployed in various Microsoft services. Prior to her current role, she was a research scientist at IBM Almaden Research Center working on SQL-on-Hadoop engines and natural language interfaces for databases. She received her Ph.D. and M.Sc. in Computer Science from University of Wisconsin-Madison working with Prof. Jignesh M. Patel and her B.Sc. from University of Athens in Greece. Avrilia co-leads the REACH OUT action at D&I in DB.
Fatma Ozcan is Principal Software Engineer in the Google Cloud Data Analytics team, since September 2020. Before that, she was a distinguished RSM and a senior manager at IBM Almaden Research Center working in the information management area and hybrid cloud. She has recently been working on natural language interfaces to data, ontologies, big data, and HTAP. Previously, she worked on Big SQL, and DB2 pureXML. She was one of the main architects of Big SQL, as well as the XQuery and SQL/XML compiler in DB2 pureXML. She is a distinguished member of ACM, a trustee on the VLDB endowment, and the treasurer of ACM SIGMOD. Fatma co-leads the REACH OUT action at D&I in DB.
Viktor Zakhary is a Senior Member of Technical Staff at Oracle, Bay Area. He has been a research assistant and a PhD candidate at the Distributed Systems Lab (DSL) at UCSB. He is interested in blockchains, distributed systems, data privacy and security, and privacy of social network users. His current research focuses on scaling and extending the functionality of permissionless blockchains. He currently works on a protocol to support atomic cross-chain transaction. Also, he works on adding transactional support to smart contracts. These protocols are steps towards building a global asset management system on permissionless blockchain infrastructures. In addition, he works on developing client-side caching protocols to resolve server-side load-imbalance in large scale distributed caching systems. Finally, he works on solving the fault-tolerance problem of ORAM stores, in addition to building client-centric tools to preserve the privacy of social network users. Viktor co-leads the INCLUDE action at D&I in DB.
Description: Full day workshop
Research in data warehousing and OLAP has produced important technologies for the design, management, and use of information systems for decision support. Nowadays, due to the advent of Big Data, Decision Support Systems (DSS) embrace a wider range of systems, in which novel solutions combining advanced data management and data analytics, (semi-)automating the data lifecycle (from ingestion to visualization). Yet, the DSS principles remain the same: these systems acknowledge the relevance to manage data in an efficient way (by means of data modelling and optimized data processing) to serve innovative data analysis bringing added value to organizations.
DSS of the future will consequently be significantly different than what the current state-of-the-practice supports. The trend is to move to more dynamic systems that allow the semi-automation of the decision making process. This means that systems partially guide their users towards data discovery, management and system-aided decision making via intelligent techniques and visualization. In the back stage, the advent of the big data era, requires that new methods, models, techniques and architectures are developed to cope with the increasing demand in capacity, data type diversity, schema and data variability and responsiveness.
DOLAP 2021 features a special theme on Data Exploration! Specifically, to promote novel solutions to tackle data management for novel DSS, DOLAP 2021 will devote a session to Data Exploration and their impact on novel Big Data Management and Analytics approaches.
Organizers: Kostas Stefanidis (Tampere University, Finland) and Patrick Marcel (University of Tours, France)
Description: Full day workshop
From spatial to spatio-temporal and, then, to mobility data. So, what’s next? it is the rise of mobility-aware integrated Big Data analytics. The Big Mobility Data Analytics (BMDA) workshop, initiated in 2018 with EDBT Conference, aims at bringing together experts in the field from academia, industry and research labs to discuss the lessons they have learned over the years, to demonstrate what they have achieved so far, and to plan for the future of “mobility”.
In its 4th edition, BMDA workshop will foster the exchange of new ideas on multidisciplinary real-world problems, discuss proposals about innovative solutions, and identify emerging opportunities for further research in the area of big mobility data analytics, such as deep learning on mobility data, edge computing, visual analytics, etc.. The workshop intends to bridge the gap between researchers and big mobility data stakeholders, including experts from critical domains, such as urban / maritime / aviation transportation, human complex networks, etc.
Organizers: Cyril Ray (Arts & Métiers ParisTech and Naval Academy Research Inst., France),
Chiara Renso (ISTI – CNR Pisa, Italy), Mahmoud Sakr (Université Libre de Bruxelles, Belgium), and
Yannis Theodoridis (University of Piraeus, Greece)
Session Co-Chairs: Nikos Giatrakos (Athena Research and Innovation Center, Greece) and Manolis Koubarakis (National and Kapodistrian University of Athens, Greece)
Description: Half day workshop
There exists a plethora of current applications, with widely different characteristics though, that are generating and need to process massive amounts of static or streaming data. For example, Data Lakes gather large amounts of diverse data from a multitude of data sources with the aim to enable data analysts to perform ad hoc, self-service analytics, and to train machine learning models, reducing the time from data to insights. These operations are also particularly challenging in the case of applications that are processing streaming Big Data. Achieving this goal requires addressing various challenges relating to data volume, velocity, dynamicity, heterogeneity, and potentially (geo-)distributed data processing.
Although there exists a plethora of techniques, algorithms and tools to manage, query and analyze various types of data, they typically require a high degree of data management skills and expertise, as well as significant time and effort for data preparation, parameter tuning and design and implementation of data analytics and machine learning pipelines.
The aim of the SIMPLIFY workshop is to bring together computer scientists with interests in this field to present recent innovations, find topics of common interest and to stimulate further development of new approaches that greatly simplify the work of a data analyst when performing data analytics, or when employing machine learning algorithms, over Big Data.
Organizers: Antonios Deligiannakis (Technical University of Crete, Greece),
Manolis Koubarakis (National and Kapodistrian University of Athens, Greece), and
Dimitris Skoutas (Athena Research Center, Greece)
Description: Full day workshop
DARLI-AP is a workshop aimed at promoting and sharing research and innovation on data analytics solutions/strategies for real-life and cutting-edge applications. The use of Information and Communication Technologies has made available a huge amount of heterogeneous data in various real application domains (e.g., smart cities, health care systems, financial applications, banking, and insurance, Industry 4.0). A data scientist is required to tackle the no-trivial task of selecting the best techniques to effectively and efficiently deal with issues related to storage, search, sharing, modeling, analysis, and visualization of data, information, and knowledge. The complexity of the task increases with variable data distribution, data heterogeneity, and data volume. Furthermore, a rich spectrum of knowledge can be extracted from real-data to characterize user behaviors, identify weaknesses and strengths, improve the quality of provided services, or even devise new ones, thus increasing the benefits of real-life applications.
The aim of the workshop is to allow academics and practitioners from various research areas to share their experiences in designing cutting-edge analytics solutions for real-life applications. Researchers are encouraged to submit their work-in-progress research activity describing innovative methodologies, algorithms, platforms addressing all facets of a data analytics process providing interesting and useful services.
Industrial implementations of data analytics applications, design, and deployment experience reports on various issues raising data analytics projects are particularly welcome. We call for research and experience papers as well as demonstration proposals covering any aspect of data analytics solutions for real-life applications.
Organizers: Tania Cerquitelli (Politecnico di Torino, Italy),
Silvia Chiusano (Politecnico di Torino, Italy), and
Genoveva Vargas (Solar, CNRS, LIG-LAFMIA, France)
Session Co-Chairs: Sihem Amer-Yahia (CNRS, Univ. Grenoble Alpes, France) and Panos K. Chrysanthis (University of Cyprus, Cyprus and University of Pittsburgh, USA)
Reception I: Diversity and Inclusion (D&I) Debrief, Sihem Amer-Yahia (CNRS, Univ. Grenoble Alpes, France), Panos K. Chrysanthis (University of Pittsburgh, USA), Avrilia Floratou (Microsoft, USA), Fatma Ozcan (Google, USA), and Victor Zakhary (Oracle, USA)
Abstract: This debrief will introduce the D&I initiative whose mission if to establish best practices for
Diversity and Inclusion in Database Conference Venues https://dbdni.github.io/
Bio: Sihem Amer-Yahia is a CNRS Research Director. Her interests are at the intersection of large-scale data management and social data exploration. Sihem held positions at QCRI, Yahoo! Research and at&t Labs. She served on the SIGMOD Executive Board and the VLDB Endowment. She is currently chairing ICDE 2020 and is the VLDB 2021 Diversity and Inclusion chair.
Panos K. Chrysanthis is a Professor of Computer Science and a founder and director of the Advanced Data Management Technologies Laboratory at the University of Pittsburgh. He is also an adjunct Professor at Carnegie-Mellon University and at the University of Cyprus. His research interests within the areas of data management include database systems, data streams systems, and interactive data exploration and visualization. His editorial service includes VLDB J, IEEE TKDE, and DAPD. He is an ACM Distinguished Scientist and a Senior Member of IEEE. In 2015, he received the University of Pittsburgh's Provost Award for Excellence in Mentoring. He is currently co-chairing the EDBT/ICDT 2021 and serves as its Diversity and Inclusion Chair.
Avrilia Floratou is a Principal Scientist at Microsoft’s Gray Systems Lab (GSL). Her research broadly lies in the area of data management with a recent focus on frameworks that simplify data science workflows. Her research interests also include large-scale stream processing, relational databases, benchmarking and performance tuning of Big Data platforms. She is currently working on simplifying various aspects of data science workflows such as data exploration, data preparation, and feature engineering among others. Previously, she worked on the Dhalion project, a framework used to define and apply SLO-based policies on long-running applications that has been deployed in various Microsoft services. Prior to her current role, she was a research scientist at IBM Almaden Research Center working on SQL-on-Hadoop engines and natural language interfaces for databases. She received her Ph.D. and M.Sc. in Computer Science from University of Wisconsin-Madison working with Prof. Jignesh M. Patel and her B.Sc. from University of Athens in Greece. Avrilia co-leads the REACH OUT action at D&I in DB.
Fatma Ozcan is Principal Software Engineer in the Google Cloud Data Analytics team, since September 2020. Before that, she was a distinguished RSM and a senior manager at IBM Almaden Research Center working in the information management area and hybrid cloud. She has recently been working on natural language interfaces to data, ontologies, big data, and HTAP. Previously, she worked on Big SQL, and DB2 pureXML. She was one of the main architects of Big SQL, as well as the XQuery and SQL/XML compiler in DB2 pureXML. She is a distinguished member of ACM, a trustee on the VLDB endowment, and the treasurer of ACM SIGMOD. Fatma co-leads the REACH OUT action at D&I in DB.
Viktor Zakhary is a Senior Member of Technical Staff at Oracle, Bay Area. He has been a research assistant and a PhD candidate at the Distributed Systems Lab (DSL) at UCSB. He is interested in blockchains, distributed systems, data privacy and security, and privacy of social network users. His current research focuses on scaling and extending the functionality of permissionless blockchains. He currently works on a protocol to support atomic cross-chain transaction. Also, he works on adding transactional support to smart contracts. These protocols are steps towards building a global asset management system on permissionless blockchain infrastructures. In addition, he works on developing client-side caching protocols to resolve server-side load-imbalance in large scale distributed caching systems. Finally, he works on solving the fault-tolerance problem of ORAM stores, in addition to building client-centric tools to preserve the privacy of social network users. Viktor co-leads the INCLUDE action at D&I in DB.
Description: Full day workshop
Research in data warehousing and OLAP has produced important technologies for the design, management, and use of information systems for decision support. Nowadays, due to the advent of Big Data, Decision Support Systems (DSS) embrace a wider range of systems, in which novel solutions combining advanced data management and data analytics, (semi-)automating the data lifecycle (from ingestion to visualization). Yet, the DSS principles remain the same: these systems acknowledge the relevance to manage data in an efficient way (by means of data modelling and optimized data processing) to serve innovative data analysis bringing added value to organizations.
DSS of the future will consequently be significantly different than what the current state-of-the-practice supports. The trend is to move to more dynamic systems that allow the semi-automation of the decision making process. This means that systems partially guide their users towards data discovery, management and system-aided decision making via intelligent techniques and visualization. In the back stage, the advent of the big data era, requires that new methods, models, techniques and architectures are developed to cope with the increasing demand in capacity, data type diversity, schema and data variability and responsiveness.
DOLAP 2021 features a special theme on Data Exploration! Specifically, to promote novel solutions to tackle data management for novel DSS, DOLAP 2021 will devote a session to Data Exploration and their impact on novel Big Data Management and Analytics approaches.
Organizers: Kostas Stefanidis (Tampere University, Finland) and Patrick Marcel (University of Tours, France)
Description: Full day workshop
Information Visualization is nowadays one of the cornerstones of Data Science, turning the abundance of Big Data being produced through modern systems into actionable knowledge. Indeed, the Big Data era has realized the availability of voluminous datasets that are dynamic, noisy and heterogeneous in nature. Transforming a data-curious user into someone who can access and analyze that data is even more burdensome now for a great number of users with little or no support and expertise on the data processing part. Thus, the area of data visualization, visual exploration and analysis has gained great attention recently, calling for joint action from different research areas from the HCI, Computer graphics and Data management and mining communities.
In this respect, several traditional problems from these communities such as efficient data storage, querying & indexing for enabling visual analytics, new ways for visual presentation of massive data, efficient interaction and personalization techniques that can fit to different user needs are revisited. The modern exploration and visualization systems should nowadays offer scalable techniques to efficiently handle billion objects datasets, limiting the visual response in a few milliseconds along with mechanisms for information abstraction, sampling and summarization for addressing problems related to visual information overplotting. Further, they must encourage user comprehension offering customization capabilities to different user-defined exploration scenarios and preferences according to the analysis needs. Overall, the challenge is to offer self-service visual analytics, i.e. enable data scientists and business analysts to visually gain value and insights out of the data as rapidly as possible, minimizing the role of IT-expert in the loop.
The BigVis workshop aims at addressing the above challenges and issues by providing a forum for researchers and practitioners to discuss exchange and disseminate their work. BigVis attempts to attract attention from the research areas of Data Management & Mining, Information Visualization and Human-Computer Interaction and highlight novel works that bridge together these communities.
Organizers: Nikos Bikakis (ATHENA Research Center, Greece),
Panos K. Chrysanthis (University of Pittsburgh, USA),
George Papastefanatos (ATHENA Research Center, Greece), and Tobias Schreck (Graz University of Technology, Austria)
Description: Half day workshop
Digital transformation comes with ethical concerns about how flexible information systems can be used and misused, posing new challenges for researchers and practitioners across the whole spectrum of Information Systems Engineering.
Similarly, ethics-related aspects are becoming prominent in the data management community, where traditional processes for searching, querying, or analyzing data hardly pay any specific attention to the social problems their outcomes could bring about. These demands are broadly reflected into codes of ethics and in legally binding regulations.
The 3rd International Workshop on “Processing Information Ethically: a plus for data Quality” (PIE+Q) will acknowledge the need for the design of responsible Information Systems with a Data Quality perspective. PIE+Q 2021 will encourage papers on the conceptual and technological approaches for dealing with ethical issues in data quality and of all data management activities, including source selection, knowledge extraction, data integration and analysis.
Organizers: Riccardo Torlone (Roma Tre University, Italy),
Letizia Tanca (Politecnico di Milano, Italy),
Donatella Firmani (Roma Tre University, Italy), and
Elena Nieddu (Roma Tre University, Italy)
Session Co-Chairs: Demetris Zeinalipour (University of Cyprus, Cyprus), Panos K. Chrysanthis (University of Cyprus, Cyprus and University of Pittsburgh, USA) and Yannis Velegrakis (University of Trento and Utrecht University, Netherlands)
Abstract: Throughout the entire history of mankind, humans have always strived to acquire new knowledge. In a way, striving for knowledge is still what unites researchers across all modern research disciplines. Yet, every single one of them has a slightly different understanding. Within Computer Science, especially in the past couple of years, we have witnessed an increasing interest in structured knowledge in the form of graphs, so-called knowledge graphs, not only in academia but also in industry. In this talk, I will sketch current advances across the knowledge life cycle – spanning from knowledge extraction via knowledge integration, management, and sharing to knowledge querying, analytics, and knowledge-enhanced applications – and discuss several examples and open challenges in more detail.
Bio: Katja Hose is a professor in Computer Science at Aalborg University. Prior to joining Aalborg University, she was a postdoc at the Max Planck Institute for Informatics in Saarbrücken, Germany, and earned her PhD in Computer Science from Ilmenau University of Technology, Germany. Her research is rooted in databases and Semantic Web technologies and spans theory, algorithms, and applications of Data Science and Web Science incl. knowledge management, querying, analytics, publishing, and extracting. She has co-authored more than 100 peer-reviewed scientific publications and regularly serves as a reviewer for databases and Semantic Web conferences and journals. She has served in many different roles for a broad range of international conferences incl. VLDB, SIGMOD, ICDE, TheWebConf/WWW, and ISWC. More details are available at http://www.cs.aau.dk/~khose.
(#260)Automated Machine Learning for Entity Matching Tasks, Matteo Paganelli (University of Modena and Reggio Emilia, Italy), Francesco Del Buono (University of Modena e Reggio Emilia, Italy), Marco Pevarello (University of Modena e Reggio Emilia, Italy), Francesco Guerra (University of Modena e Reggio Emilia, Italy), and Maurizio Vincini (University of Modena e Reggio Emilia, Italy) (30sec pitch)(10min)(paper)(poster)
Abstract: Knowledge Graphs can be considered as fulfilling an early vision in Computer Science of creating intelligent systems that integrate knowledge and data at large scale. Stemming from scientific advancements in research areas of Semantic Web, Databases, Knowledge representation, NLP, Machine Learning, among others, Knowledge Graphs have rapidly gained popularity in academia and industry in the past years. The integration of such disparate
disciplines and techniques give the richness to Knowledge Graphs, but also present the challenge to practitioners and theoreticians to know how current advances develop from early techniques in order, on one hand, take full advantage of them, and on the other, avoid reinventing the wheel. This tutorial will provide a historical context on the roots of Knowledge Graphs grounded in the advancements of Logic, Data, and the combination thereof.
Bio: Claudio Gutierrez is full professor at the Computer Science Department, Universidad de Chile and Senior Research at the Millenium Institute for Foundation of Data. His research experiences lies in the intersection of Databases and the Semantic Web, focusing in data models and query languages for RDF layer, particularly RDF and SPARQL.
Juan Sequeda is the Principal Scientist at data.world. He joined through the acquisition of Capsenta, a company he founded as a spin-off from his research. He holds a PhD in Computer
Science from The University of Texas at Austin. His research interests are at the intersection of Logic and Data for (ontology-based) data integration and semantic/graph data management and Knowledge Graphs.
Session Chair: Themis Palpanas (University of Paris, France)
EDBT 2021 Test-of-Time Award SeMiTri: a framework for semantic annotation of heterogeneous trajectories by Zhixian Yan, Dipanjan Chakraborty, Christine Parent, Stefano Spaccapietra, and Karl Aberer. DOI: 10.1145/1951365.1951398.
EDBT 2021 Best Paper Award DomainNet: Homograph Detection for Data Lake Disambiguation by Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller and Mirek Riedewald. DOI: 10.5441/002/edbt.2021.03.
EDBT 2021 Best Short Paper Award Answer Graph: Factorization Matters in Large Graphs
by Zahid Abul-Basher, Nikolay Yakovets, Parke Godfrey, Stanley Clark, and Mark Chignell.
DOI: 10.5441/002/edbt.2021.56.
EDBT 2021 Best Demonstration Award Conversational OLAP in Action
by Matteo Francia, Enrico Gallinucci, and Matteo Golfarelli.
DOI: 10.5441/002/edbt.2021.74.
ICDT 2021 Best Paper Award
Answer Counting Under Guarded TGDs by Cristina Feier, Carsten Lutz, and Marcin Przybylko.
DOI: 10.4230/LIPIcs.ICDT.2021.11
ICDT 2021 Test of Time Award
Knowledge compilation meets database theory: compiling queries to decision diagrams by Abhay Jha and Dan Suciu.
DOI: 10.1145/1938551.1938574.
Abstract: In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML. The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time. The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multi-layer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models.
Bio: Full Professor at Pontificia Universidad Católica de Chile, where he also acts as Director of the Institute for Mathematical and Computational Engineering. He is the author of more than 80 technical papers, has chaired ICDT 2019, will be chairing ACM PODS 2022, and is currently a member of the editorial committee of Logical Methods in Computer Science. From 2011 to 2014 he was the editor of the database theory column of SIGMOD Record. His areas of interest are database theory, logic in computer science, and the emerging relationship between these areas and machine learning.
Session Co-Chairs: Panos K. Chrysanthis (University of Cyprus, Cyprus and University of Pittsburgh, USA) and Demetris Zeinalipour (University of Cyprus, Cyprus)
Philip A. Bernstein (Microsoft Research, USA)
Philip A. Bernstein is a Distinguished Scientist at Microsoft Research. Over the past 40 years, he has been a product architect at Microsoft and Digital Equipment Corp., a professor at Harvard University and Wang Institute of Graduate Studies, and a VP Software at Sequoia Systems. He has published over 150 papers and two books on the theory and implementation of database systems, especially on transaction processing and data integration, and has contributed to a variety of database products. He is a Fellow of the ACM and AAAS, a winner of ACM SIGMOD’s E.F. Codd Innovations Award, and a member of the Washington State Academy of Sciences and the National Academy of Engineering. He received a B.S. degree from Cornell and M.Sc. and Ph.D. from University of Toronto.
Laura M. Haas (University of Massachusetts - Amherst, USA)
Laura Haas is the Dean of the College of Information and Computer Sciences at the University of Massachusetts Amherst. She was formerly an IBM Fellow and the founder and director of IBM Research’s Accelerated Discovery Lab. She has held a broad range of positions at IBM in both research and development divisions. She is best known for her work on the Starburst query processor, from which DB2 LUW was developed, on Garlic, a system which allowed integration of heterogeneous data sources, and on Clio, the first semi-automatic tool for heterogeneous schema mapping. She has received the Anita Borg Institute Technical Leadership Award, the ACM SIGMOD Codd Innovation Award, the IEEE Computer Society Computer Pioneer Award and many IBM awards including a Corporate Award for information integration technology. She has served as Vice President of the VLDB Endowment Board of Trustees and as Vice Chair of the Computing Research Association board; she currently serves on the National Academies’ Computer Science and Telecommunications Board. She is an ACM Fellow, a member of the National Academy of Engineering, the IBM Academy of Technology, and a Fellow of the American Academy of Arts and Sciences.
Yannis Ioannidis (University of Athens and Athena Research Center, Greece)
Yannis Ioannidis is a Professor of Informatics and Telecommunications at the National & Kapodistrian University of Athens as well as an Associated Faculty at the Athena Research Center, where he also served as President and General Director for 10 years (2011-2021). His research interests include Database and Information Systems, Data Science, Data and Text Analytics, Recommender Systems and Personalization, Data Infrastructures and Digital Repositories, and Computer-Human Interaction, topics on which he has published over 160 articles in leading journals and conferences and also holds three patents. His work is often motivated by data management problems that arise in diverse industrial environments or in the context of other scientific fields (Life Sciences, Cultural Heritage, Biodiversity, Physical Sciences). A fellow of the ACM and IEEE, a member Academia Europaea, and a recipient of the ACM SIGMOD Contributions Award, his work has been recognized through the VLDB "10-Year Best Paper Award", the NSF "Presidential Young Investigator Award", and several awards for teaching excellence, including the "Xanthopoulos-Pneumatikos Award for Outstanding Academic Teaching" in Greece and the "Chancellor's Award for Excellence in Teaching" at the University of Wisconsin. He is currently a member of the ACM Europe Council, a member of the ICDE Steering Committee, a vice chair of the European Strategy Forum on Research Infrastructures (ESFRI), and a member of the strategic management board of the Greek hub of the UN Sustainable Development Solutions Network.
Jeffrey D. Ullman (Stanford, USA)
Jeff Ullman is the Stanford W. Ascherman Professor of Engineering
(Emeritus) in the Department of Computer Science at Stanford and CEO
of Gradiance Corp. He received the B.S. degree from Columbia
University in 1963 and the PhD from Princeton in 1966. Prior to his
appointment at Stanford in 1979, he was a member of the technical
staff of Bell Laboratories from
1966-1969, and on the faculty of Princeton University between
1969 and 1979. From 1990-1994, he was chair of the Stanford Computer
Science Department. Ullman was elected to the National Academy of
Engineering in 1989, the American Academy of Arts and Sciences in
2012, and has held Guggenheim and Einstein Fellowships. He has
received the Sigmod Contributions Award (1996), the ACM Karl V. Karlstrom
Outstanding Educator Award (1998), the Knuth Prize (2000),
the Sigmod E. F. Codd Innovations award (2006), the IEEE von
Neumann medal (2010), and the NEC C&C Foundation Prize (2017).
He is the author of 16 books, including books
on database systems, compilers, automata theory, and algorithms. His interests include database theory, database integration, data mining, and education using the information infrastructure.
Session Co-Chairs: Wolfram Wingerath (Baqend, Germany) and Fabian Panse (University of Hamburg, Germany)
(#68)Sequence detection in event log files, Ioannis Mavroudopoulos (Aristotle University of Thessaloniki, Greece), Theodoros Toliopoulos (Aristotle University of Thessaloniki, Greece), Christos Bellas (Aristotle University of Thessaloniki, Greece), Andreas Kosmatopoulos (Aristotle University of Thessaloniki, Greece), and Anastastios Gounaris (Aristotle University of Thessaloniki, Greece) (30sec pitch)(10min)(paper)(poster)
Abstract: Permissioned blockchains are becoming increasingly mainstream and are being considered for solving problems similar to what databases have traditionally solved, with the main difference that permissioned blockchains distribute trust and can work even with several participants who do not fully trust each other. As a result, there are numerous research proposals in the intersection of databases and blockchains. Sadly, there are still many misconceptions about this technology which leads to confusion in the community. The main goal of the tutorial is to provide a background on the technology and to contrast it with public, permissionless, blockchains. We will familiarize participants with the internals of permissioned blockchains and explain how they can be used in non-cryptocurrency scenarios. Through a hands-on part, participants will “learn by doing” how some of the most promising use-cases of permissioned blockchains translate to actual smart contracts. We will focus on a supply-chain
management-like application and, as our target platform, we will use Hyperledger Fabric 1.4 LTS, an open-source, modular, widely used enterprise blockchain platform.
Bio: The tutorial will be held by Zsolt István. He is an Associate Professor at the IT University of Copenhagen. Before that, he was an Assistant Research Professor at the IMDEA Software Institute in Madrid, Spain, with years of experience in databases, distributed systems, and FPGA programming. He holds a PhD and MSc in Computer Science from ETH Zurich, Switzerland and a BSc in Computer Science from UT Cluj-Napoca, Romania. His personal website is at: https://zistvan.github.io.
(#26)KISS - A fast kNN-based Importance Score for Subspaces, Anna Beer (Ludwig-Maximilians-Universität München, Germany), Ekaterina Allerborn (Ludwig-Maximilians-Universität München, Germany), Valentin Hartmann (École Polytechnique Fédérale de Lausanne, Switzerland), and Thomas Seidl (Ludwig-Maximilians-Universität München, Germany) (30sec pitch)(10min)(paper)(poster)
(#156)Path Indexing in the Cypher Query Pipeline, Jochem Kuijpers (Eindhoven University of Technology, Netherlands), George Fletcher (Eindhoven University of Technology, Netherlands), Tobias Lindaaker (Neo4j, Sweden), and Nikolay Yakovets (Eindhoven University of Technology, Netherlands) (30sec pitch)(10min)(paper)(poster)
Abstract: Permissioned blockchains are becoming increasingly mainstream and are being considered for solving problems similar to what databases have traditionally solved, with the main difference that permissioned blockchains distribute trust and can work even with several participants who do not fully trust each other. As a result, there are numerous research proposals in the intersection of databases and blockchains. Sadly, there are still many misconceptions about this technology which leads to confusion in the community. The main goal of the tutorial is to provide a background on the technology and to contrast it with public, permissionless, blockchains. We will familiarize participants with the internals of permissioned blockchains and explain how they can be used in non-cryptocurrency scenarios. Through a hands-on part, participants will “learn by doing” how some of the most promising use-cases of permissioned blockchains translate to actual smart contracts. We will focus on a supply-chain
management-like application and, as our target platform, we will use Hyperledger Fabric 1.4 LTS, an open-source, modular, widely used enterprise blockchain platform.
Bio: The tutorial will be held by Zsolt István. He is an Associate Professor at the IT University of Copenhagen. Before that, he was an Assistant Research Professor at the IMDEA Software Institute in Madrid, Spain, with years of experience in databases, distributed systems, and FPGA programming. He holds a PhD and MSc in Computer Science from ETH Zurich, Switzerland and a BSc in Computer Science from UT Cluj-Napoca, Romania. His personal website is at: https://zistvan.github.io.
Abstract: Data profiling is the act of extracting many different types of metadata from a given dataset. This research area has recently thrived, due to (i) its simple problem statements, such as “discover all key candidates”, (ii) the high computational complexity of the problems, which are often exponential in the number of columns, (iii) the manifold opportunities for optimizations, such as apriori-inspired pruning or data sampling, and (iv) the various application areas for data profiling results, such as query optimization and data cleaning. In the talk, we will first cover the traditional problem statements and use cases and then highlight algorithmic intuitions of some solutions. Finally, we discuss open problems and promising research directions, for both data profiling and its use cases.
Bio: Prof. Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma (MA) in 1997 he completed his PhD thesis in the area of data quality at Humboldt University of Berlin in 2000. In 2001 and 2002 he worked at the IBM Almaden Research Center on data integration topics. From 2003 – 2006 he was assistant professor for information integration, again at the Humboldt-University of Berlin. Since 2006 he holds the chair for information systems at the Hasso Plattner Institute (HPI) at the University of Potsdam in Germany. He has been visiting researcher at QCRI, AT&T Research, IBM Research, and SAP. His research interests include data profiling, data cleansing, and data integration with over 200 scientific publications. Next to numerous PC memberships for international conferences, he has organized several conferences in various roles including VLDB 2021 as PC co-chair, and he is trustee of the VLDB Endowment. More details are at https://hpi.de/naumann/people/felix-naumann.html.
(#34)Diverse Data Selection under Fairness Constraints, Zafeiria Moumoulidou (University of Massachusetts, Amherst, United States), Andrew McGregor (University of Massachusetts, Amherst, United States), and Alexandra Meliou (University of Massachusetts, Amherst, United States) (paper)(13 min)
Session Co-Chairs: Laura Po (University of Modena and Reggio Emilia, Italy) and Federica Rollo (University of Modena and Reggio Emilia, Italy)
(#78)GeoBlocks: A Query-Cache Accelerated Data Structure for Spatial Aggregation over Polygons, Christian Winter (Technical University of Munich, Germany), Andreas Kipf (Massachusetts Institute of Technology, USA), Christoph Anneser (Technical University of Munich, Germany), Eleni Tzirita Zacharatou (Technical University of Berlin, Germany), Thomas Neumann (Technical University of Munich, Germany), and Alfons Kemper (Technical University of Munich, Germany) (30sec pitch)(10min)(paper)(poster)
(#167)Smart City Data Analysis via Visualization of Correlated Attribute Patterns, Yuya Sasaki (Osaka University, Japan), Keizo Hori (Osaka University, Japan), Daiki Nishihara (Osaka University, Japan), Ohashi Sora (Osaka University, Japan), Yusuke Wakuta (Osaka University,Japan), Kei Harada (Osaka University, Japan), Makoto Onizuka (Osaka University, Japan), Yuki Arase (Osaka University, Japan), Shinji Shimojo (Osaka University, Japan), Kenji Doi (Osaka University, Japan), Hongdi He (Shanghai Jiao Tong University, China), and Zhong-ren Peng (University of Florida, USA) (30sec pitch)(10min)(paper)(poster)
(#168)SciNeM: A Scalable Data Science Tool for Heterogeneous Network Mining, Serafeim Chatzopoulos (Athena Research Center, Greece), Thanasis Vergoulis (Athena Research Center, Greece), Panagiotis Deligiannis (Athena Research Center, Greece), Dimitrios Skoutas (Athena Research Center, Greece), Theodore Dalamagas (Athena Research Center, Greece), and Christos Tryfonopoulos (University of the Peloponnese, Greece) (30sec pitch)(10min)(paper)(poster)
(#169)IMCF: The IoT Meta-Control Firewall for Smart Buildings, Soteris Constantinou (University of Cyprus, Cyprus), Antonis Vasileiou (University of Cyprus, Cyprus), Andreas Konstantinidis (Frederick University, Cyprus), Panos Chrysanthis (University of Pittsburgh, USA), and Demetrios Zeinalipour-Yazti (University of Cyprus, Cyprus) (30sec pitch)(10min)(paper)(poster)
Session Chair: Maria Luisa Damiani (University of Milan, Italy)
(#173)Correlation graph analytics for stock time series data, Tong Liu (Free University of Bozen-Bolzano, Italy), Paolo Coletti (Free University of Bozen-Bolzano, Italy), Anton Dignös (Free University of Bozen-Bolzano, Italy), Johann Gamper (Free University of Bozen-Bolzano, Italy), and Maurizio Murgia (Free University of Bozen-Bolzano, Italy) (30sec pitch)(10min)(paper)(poster)
(#176)Visualizing and Exploring Big Datasets based on Semantic Community Detection, Maria Krommyda (National Technical University of Athens, Greece), Konstantinos Tsitseklis (National Technical University of Athens, Greece), Verena Kantere (National Technical University of Athens, Greece), Vasileios Karyotis (Ionian University, Greece), and Symeon Papavassiliou (National Technical University of Athens, Greece) (30sec pitch)(10min)(paper)(poster)
(#178)Exploration and Analysis of Temporal Property Graphs, Christopher Rost (University of Leipzig, Germany), Kevin Gomez (University of Leipzig, Germany), Philip Fritzsche (University of Leipzig, Germany), Andreas Thor (Leipzig University of Applied Sciences, Germany), and Erhard Rahm (University of Leipzig, Germany) (30sec pitch)(10min)(paper)(poster)
(#185)A Tool for JSON Schema Witness Generation, Lyes Attouche (Université Paris-Dauphine, France), Mohamed-Amine Baazizi (Sorbonne Université, France), Dario Colazzo (Université Paris Dauphine - Paris Sciences et Lettres University, France), Francesco Falleni (Università di Pisa, Italy), Giorgio Ghelli (Università di Pisa, Italy), Cristiano Landi (Università di Pisa, Italy), Carlo Sartiani (Università della Basilica, Italy), and Stefanie Scherzinger (University of Passau, Germany) (30sec pitch)(10min)(paper)(poster)
Abstract: To bridge the gap between users and data, numerous text-to-SQL systems have been developed that allow users to pose natural language questions over relational databases. Recently, novel text-to-SQL systems are adopting deep learning methods with very promising results. At the same time, several challenges remain open making this area an active and flourishing field of research and development. To make real progress in building text-to-SQL systems, we need to de-mystify what has been done, understand how and when each approach can be used, and, finally, identify the research challenges ahead of us. The purpose of this tutorial is to provide a systematic study of the recent advances of deep learning techniques for text-to-SQL translation, and to highlight open problems and new research opportunities for researchers and practitioners in the fields of database systems, natural language processing and deep learning.
Bio: George Katsogiannis-Meimarakis is a research assistant at Athena Research Center in Athens, Greece, where he works at the INODE (Intelligent Open Data Exploration) project, focusing on the text-to-SQL problem. He is a graduate of the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens, where he completed his thesis with the title “Translating Natural Language to SQL using Deep Learning”. Currently, he is attending a MSc programme on Data Science and Information Technologies with a specialisation on Artificial Intelligence and Big Data.
Dr. Georgia Koutrika is Research Director at Athena Research Center, Greece. She has more than 15 years of experience in multiple roles at HP Labs, IBM Almaden, and Stanford. Her work focuses on data exploration, recommendations, and data analytics, and has been incorporated in commercial products, described in 13 granted patents and 16 patent applications in the US and worldwide, and published in more than 90 papers in top-tier conferences and journals. Her academic activities include: Editor-in-chief for VLDB Journal, PC co-chair for VLDB 2023, associate editor for TKDE, SIGMOD2021 and VLDB2022, and ICDE2021 sponsorship chair, and general chair for ACM SIGMOD 2016.
(#167)Smart City Data Analysis via Visualization of Correlated Attribute Patterns, Yuya Sasaki (Osaka University, Japan), Keizo Hori (Osaka University, Japan), Daiki Nishihara (Osaka University, Japan), Ohashi Sora (Osaka University, Japan), Yusuke Wakuta (Osaka University,Japan), Kei Harada (Osaka University, Japan), Makoto Onizuka (Osaka University, Japan), Yuki Arase (Osaka University, Japan), Shinji Shimojo (Osaka University, Japan), Kenji Doi (Osaka University, Japan), Hongdi He (Shanghai Jiao Tong University, China), and Zhong-ren Peng (University of Florida, USA) (30sec pitch)(10min)(paper)(poster)
(#168)SciNeM: A Scalable Data Science Tool for Heterogeneous Network Mining, Serafeim Chatzopoulos (Athena Research Center, Greece), Thanasis Vergoulis (Athena Research Center, Greece), Panagiotis Deligiannis (Athena Research Center, Greece), Dimitrios Skoutas (Athena Research Center, Greece), Theodore Dalamagas (Athena Research Center, Greece), and Christos Tryfonopoulos (University of the Peloponnese, Greece) (30sec pitch)(10min)(paper)(poster)
(#169)IMCF: The IoT Meta-Control Firewall for Smart Buildings, Soteris Constantinou (University of Cyprus, Cyprus), Antonis Vasileiou (University of Cyprus, Cyprus), Andreas Konstantinidis (Frederick University, Cyprus), Panos Chrysanthis (University of Pittsburgh, USA), and Demetrios Zeinalipour-Yazti (University of Cyprus, Cyprus) (30sec pitch)(10min)(paper)(poster)
(#173)Correlation graph analytics for stock time series data, Tong Liu (Free University of Bozen-Bolzano, Italy), Paolo Coletti (Free University of Bozen-Bolzano, Italy), Anton Dignös (Free University of Bozen-Bolzano, Italy), Johann Gamper (Free University of Bozen-Bolzano, Italy), and Maurizio Murgia (Free University of Bozen-Bolzano, Italy) (30sec pitch)(10min)(paper)(poster)
(#176)Visualizing and Exploring Big Datasets based on Semantic Community Detection, Maria Krommyda (National Technical University of Athens, Greece), Konstantinos Tsitseklis (National Technical University of Athens, Greece), Verena Kantere (National Technical University of Athens, Greece), Vasileios Karyotis (Ionian University, Greece), and Symeon Papavassiliou (National Technical University of Athens, Greece) (30sec pitch)(10min)(paper)(poster)
(#178)Exploration and Analysis of Temporal Property Graphs, Christopher Rost (University of Leipzig, Germany), Kevin Gomez (University of Leipzig, Germany), Philip Fritzsche (University of Leipzig, Germany), Andreas Thor (Leipzig University of Applied Sciences, Germany), and Erhard Rahm (University of Leipzig, Germany) (30sec pitch)(10min)(paper)(poster)
(#185)A Tool for JSON Schema Witness Generation, Lyes Attouche (Université Paris-Dauphine, France), Mohamed-Amine Baazizi (Sorbonne Université, France), Dario Colazzo (Université Paris Dauphine - Paris Sciences et Lettres University, France), Francesco Falleni (Università di Pisa, Italy), Giorgio Ghelli (Università di Pisa, Italy), Cristiano Landi (Università di Pisa, Italy), Carlo Sartiani (Università della Basilica, Italy), and Stefanie Scherzinger (University of Passau, Germany) (30sec pitch)(10min)(paper)(poster)
Abstract: Search Optimization Service is a new Snowflake feature that speeds up selective queries on very large tables by orders of magnitude. Search optimization utilizes Snowflake's unique architecture of organizing data in small micro partitions. When Search Optimization is enabled, Snowflake builds efficient indices over micro partitions in the background, and maintains them in a serverless fashion. In this talk, we'll revisit the journey that led to Search Optimization, dive into technical details that make Search Optimization effective at finding the proverbial needle-in-a-haystack, and conclude with some of the research challenges that we are working on today.
Bio: Ismail is a Senior Software Engineer at Snowflake’s Berlin office. He has been working on the Search Optimization Service since its inception. He holds a PhD in database systems from TU Dresden (Germany) and a Msc. from Grenoble Institute of Technology (France). Ismail holds 10+ patents and has published his research results in premier database research conferences such as SIGMOD and VLDB.
Stefan is a Senior Software Engineer at Snowflake's Berlin office and has been working on the Search Optimization Service since its inception. Previously, he was working at Data Artisans as one of the original creators of the Apache Flink stream processing framework. He holds a PhD and Msc. in computer science from Saarland University (Germany).
Abstract: Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. The simplest kind of ranker is score-based, it computes a score of each candidate independently and returns the candidates in score order. Another common kind of ranker is learning-to-rank, where supervised learning is used to predict the ranking of unseen candidates. For both kinds of rankers, we may output the entire permutation or only the highest scoring k candidates, the top-k. Set selection is a special case of ranking that ignores the relative order among the top-k.Read Less In the past few years, there has been much work on incorporating fairness and diversity requirements into algorithmic rankers, with contributions coming from the data management, algorithms, information retrieval, and recommender systems communities. In my talk I will offer a broad perspective that connects formalizations and algorithmic approaches across subfields, grounding them in a common narrative around the value frameworks that motivate specific fairness- and diversity-enhancing interventions. I will discuss some recent and ongoing work, and will outline future research directions where the data management community is well-positioned to make lasting impact, especially if we attack these problems with our rich theory-meets-systems toolkit.
Bio: Julia Stoyanovich is an Assistant Professor in the Department of Computer Science and Engineering at the Tandon School of Engineering, and the Center for Data Science. She is a recipient of an NSF CAREER award and of an NSF/CRA CI Fellowship. Julia’s research focuses on responsible data management and analysis practices: on operationalizing fairness, diversity, transparency, and data protection in all stages of the data acquisition and processing lifecycle. She established the Data, Responsibly consortium, and serves on the New York City Automated Decision Systems Task Force (by appointment by Mayor de Blasio). In addition to data ethics, Julia works on management and analysis of preference data, and on querying large evolving graphs. She holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst.
(#12)Optimising Fairness Through Parametrised Data Sampling, Vladimiro González-Zelaya (Newcastle University, United Kingdom, Universidad Panamericana, Mexico), Julián Salas (Universitat Rovira i Virgili, Spain), Dennis Prangle (Newcastle University, United Kingdom), and Paolo Missier (Newcastle University, United Kingdom) (30sec pitch)(10min)(paper)(poster)
(#259)Using Landmarks for Explaining Entity Matching Models, Andrea Baraldi (University of Modena and Reggio Emilia, Italy), Francesco Del Buono (University of Modena e Reggio Emilia, Italy), Matteo Paganelli (University of Modena and Reggio Emilia, Italy), and Francesco Guerra (University of Modena e Reggio Emilia, Italy) (30sec pitch)(10min)(paper)(poster)
Session Co-Chairs: Yannis Velegrakis (University of Trento and Utrecht University, Netherlands) and Stefan Manegold (CWI, Netherlands)
Big Sequence Management: Scaling up and Out, Karima Echihabi (Mohammed VI Polytechnic University, Morocco), Kostas Zoumpatianos (LIPADE, Université de Paris, France), and Themis Palpanas (LIPADE, Université de Paris & French University Institute, France)
Abstract: Data series are a prevalent data type that has attracted lots of interest in recent years. Specifically, there has been an explosive interest towards the analysis of large volumes of data series in many different domains. This is both in businesses (e.g., in mobile applications) and in sciences (e.g., in biology). In this tutorial, we focus on applications that produce massive collections of data series, and we provide the necessary background on data series storage, retrieval and analytics. We look at systems historically used to handle and mine data in the form of data series, as well as at the state of the art data series management systems that were recently proposed. Moreover, we discuss the need for fast similarity search for supporting data mining applications, and describe efficient similarity search techniques, indexes and query processing algorithms. Finally, we look at the gap of modern data series management systems in regards to support for efficient complex analytics, and we argue in favor of the integration of summarizations and indexes in modern data series management systems. We conclude with the challenges and open research problems in this domain.
Bio: Karima Echihabi is an Assistant Professor at Mohammed VI Polytechnic University (UM6P) in Morocco. She is interested in scalable data analytics and data series management and has performed an extensive analysis of data series indexes. She holds a PhD degree from Mohammed V University (Morocco) and the University of Paris (France) and a Masters Degree in Computer Science from the University of Toronto. She has worked as a software engineer in the Windows team at Microsoft, Redmond (USA), and the Query Optimizer team at the IBM Toronto Lab (Canada).
Kostas Zoumpatianos is a Software Engineer at Snowflake Computing. He has been a Marie Curie Fellow at the University of Paris and a postdoctoral researcher at Harvard University. He got his PhD from the University of Trento in topics related to indexing and managing large collections of data series. He also holds a M.Sc. in Information Management and a Dipl.Eng. in Information and Communication Systems Engineering from the University of the Aegean in Greece.
Themis Palpanas is Senior Member of the French University Institute (IUF), a distinction that recognizes excellence across all academic disciplines, and professor of computer science at the University of Paris (France), where he is director of the Data Intelligence Institute of Paris (diiP), and director of the data management group, diNo. He received the BS degree from the National Technical University of Athens, Greece, and the MSc and PhD degrees from the University of Toronto, Canada. His interests include problems related to data science (big data analytics and machine learning applications). He is the author of 9 US patents and 2 French patents. He is the recipient of 3 Best Paper awards, and the IBM Shared University Research (SUR) Award. He is currently serving on the VLDB Endowment Board of Trustees, and as an Editor in Chief for the BDR Journal. He has served as General Chair for VLDB 2013, and in the program committees of all major conferences in the areas of data management and analysis.
Session Chair: Antoine Amarilli (Télécom Paris, France)
Climate Change Session, Chair: Antoine Amarilli (Télécom Paris, France). External guest: Benjamin Pierce (University of Pennsylvania, USA)
Abstract: This session focuses on the ongoing climate crisis and how it concerns EDBT/ICDT. We will report on the proposed measures for EDBT/ICDT following last year's session (presented at ), and will discuss about what this issue implies for our community and which measures could be adopted for the future of the conference.
The session will feature Benjamin Pierce (UPenn, former head of the SIGPLAN climate committee) as an outside guest to broaden our perspective.
Bio: Antoine Amarilli is Associate Professor in Computer Science at Télécom Paris. After obtaining the Parisian Master of Research in Computer Science (MPRI) from the École Normale Supérieure in 2012, he began his thesis at Télécom Paris on the subject « Leveraging the structure of uncertain data » under the supervision of Pierre Senellart. He obtained his PhD in computer science in March 2016, which was awarded the Télécom Paris PhD prize and a Beth Dissertation Award.
His research topics focus on data management and data mining. He leads the DIG team seminar. His research works and talks are available online. He publishes a personal website and blog, and maintains the TCS4F initiative on the climate crisis and No free view? No review! on open access to scientific publications.
Benjamin Pierce is Henry Salvatori Professor of Computer and Information
Science at the University of Pennsylvania and a Fellow of the ACM. His
research interests include programming languages, type systems,
language-based security, computer-assisted formal verification,
differential privacy, and synchronization technologies. He is the author
of the widely used graduate textbooks Types and Programming Languages
and Software Foundations. He has served as co-Editor in Chief of the
Journal of Functional Programming, as Managing Editor for Logical
Methods in Computer Science, and as editorial board member of
Mathematical Structures in Computer Science, Formal Aspects of
Computing, and ACM Transactions on Programming Languages and Systems, as
vice-chair of ACM SIGPLAN, and as a member of ACM Council. He holds a
doctorate honoris causa from Chalmers University and in 2020 was awarded
the inaugural SIGPLAN Distinguished Educator's Award. He is also the
lead designer of the popular Unison file synchronizer and co-developer
of the Clowdr virtual conference platform.
(#63)Twin Subsequence Search in Time Series, Georgios Chatzigeorgakidis (Athena Research Center, Greece), Dimitrios Skoutas (Athena Research Center, Greece), Kostas Patroumpas (Athena Research Center, Greece), Themis Palpanas (University of Paris, France), Spiros Athanasiou (Athena Research Center, Greece), and Spiros Skiadopoulos (University of the Peloponnese, Greece) (30sec pitch)(10min)(paper)(poster)
Session Co-Chairs: Stefania Dumbrava (ENSIIE & Institut Polytechnique de Paris, France) and Michael Gubanov (Florida State University, USA)
(#48)Multiple-Source Context-Free Path Querying in Terms of Linear Algebra, Arseniy Terekhov (Information Technologies, Mechanics and Optics University, Russia), Vlada Pogozhelskaya (St.Petersburg State University, Russia), Vadim Abzalov (St.Petersburg State University, Russia), Timur Zinnatulin (St.Petersburg State University, Russia), and Semyon Grigorev (St. Petersburg State University, Russia) (30sec pitch)(10min)(paper)(poster)
(#276)Answer Graph: Factorization Matters in Large Graphs, Zahid Abul-Basher (University of Toronto, Canada), Nikolay Yakovets (Eindhoven University of Technology, Netherlands), Parke Godfrey (York University, Canada), Stanley Clark (Eindhoven University of Technology, Netherlands), and Mark Chignell (University of Toronto, Canada) (30sec pitch)(10min)(paper)(poster)
Session Chair: Yannis Velegrakis (University of Trento and Utrecht University, Netherlands)
Big Sequence Management: Scaling up and Out, Karima Echihabi (Mohammed VI Polytechnic University, Morocco), Kostas Zoumpatianos (LIPADE, Université de Paris, France), and Themis Palpanas (LIPADE, Université de Paris & French University Institute, France)
Abstract: Data series are a prevalent data type that has attracted lots of interest in recent years. Specifically, there has been an explosive interest towards the analysis of large volumes of data series in many different domains. This is both in businesses (e.g., in mobile applications) and in sciences (e.g., in biology). In this tutorial, we focus on applications that produce massive collections of data series, and we provide the necessary background on data series storage, retrieval and analytics. We look at systems historically used to handle and mine data in the form of data series, as well as at the state of the art data series management systems that were recently proposed. Moreover, we discuss the need for fast similarity search for supporting data mining applications, and describe efficient similarity search techniques, indexes and query processing algorithms. Finally, we look at the gap of modern data series management systems in regards to support for efficient complex analytics, and we argue in favor of the integration of summarizations and indexes in modern data series management systems. We conclude with the challenges and open research problems in this domain.
Bio: Karima Echihabi is an Assistant Professor at Mohammed VI Polytechnic University (UM6P) in Morocco. She is interested in scalable data analytics and data series management and has performed an extensive analysis of data series indexes. She holds a PhD degree from Mohammed V University (Morocco) and the University of Paris (France) and a Masters Degree in Computer Science from the University of Toronto. She has worked as a software engineer in the Windows team at Microsoft, Redmond (USA), and the Query Optimizer team at the IBM Toronto Lab (Canada).
Kostas Zoumpatianos is a Software Engineer at Snowflake Computing. He has been a Marie Curie Fellow at the University of Paris and a postdoctoral researcher at Harvard University. He got his PhD from the University of Trento in topics related to indexing and managing large collections of data series. He also holds a M.Sc. in Information Management and a Dipl.Eng. in Information and Communication Systems Engineering from the University of the Aegean in Greece.
Themis Palpanas is Senior Member of the French University Institute (IUF), a distinction that recognizes excellence across all academic disciplines, and professor of computer science at the University of Paris (France), where he is director of the Data Intelligence Institute of Paris (diiP), and director of the data management group, diNo. He received the BS degree from the National Technical University of Athens, Greece, and the MSc and PhD degrees from the University of Toronto, Canada. His interests include problems related to data science (big data analytics and machine learning applications). He is the author of 9 US patents and 2 French patents. He is the recipient of 3 Best Paper awards, and the IBM Shared University Research (SUR) Award. He is currently serving on the VLDB Endowment Board of Trustees, and as an Editor in Chief for the BDR Journal. He has served as General Chair for VLDB 2013, and in the program committees of all major conferences in the areas of data management and analysis.