Management and Analysis of Data and Media (MADM)
Staff
- Sapino Maria Luisa (Coordinator)
- Pasteris Paolo (Member)
- K. Selcuk Candan (Collaborator)
- Ardissono Liliana (Collaborator)
- Pensa Ruggero Gaetano (Collaborator)
- Xilun Chen (External Collaborator)
- Shengyu Huang (External Collaborator)
- Jung Hyun Kim (External Collaborator)
- Xinsheng Li (External Collaborator)
- Sicong Liu (External Collaborator)
- Parth Nagarkar (External Collaborator)
- Yash Garg (External Collaborator)
- (Member)
- Poccia Silvestro Roberto (Member)
Contacts
- + 39 011 6706745
- mlsapino@di.unito.it
- Send email to members
Activity
The research interests of the group span over the many challenges of heterogeneous, possibly multi-media, data management, with special attention to the so called “big data” challenges.
Data collected in most application domains are rich in volume and diversity. For example, in the medical domain, patients’ records include structured data (e.g., blood pressure values, temperature values), images (e.g., cat-scans), unstructured text (e.g., the reports written by the specialists), images and videos (e.g., recordings of invasive exams, such as a coronography), time series (e.g., ecg), and other forms of media. Consequently, generating value out of these rich and diverse data sets shares the 3V challenges ([V]olume, [V]elocity, and[V]ariety) of the so called “Big Data” applications. We note that to 3Vs are not sufficient and in order to support effective knowledge discovery, we must tackle additional, more specific, challenges, including those posed by the [H]igh-dimensional, [M]ulti-modal (temporal, spatial, hierarchical, and graph-structured), and inter-[L]inked nature of most multimedia data as well as the [I]mprecision of the media features and [S]parsity of the observations in the real-world. Moreover, since the end-users for most multimedia data exploration tasks are[H]uman beings, we need to consider additional fundamental constraints from the difficulties they face in providing unambiguous specifications of interest or preference, subjectivity in their interpretations of results, and their limitations in perception and memory. Last, but not the least, since a large portion of multimedia data is human-centered, we also need to account for the users’ (and others’) needs for [P]rivacy.
Interestingly, we observe that different domains and disciplines, apparently far from each other (such as Building Energy Consumption Analysis and Study of the Infectious Disease Propagation) can be seen as sharing common underlying models and can benefit from similar fundamental technological innovations.
The group works in tight collaboration with the EMITLAB (Enterprise, Media, and Information Technologies Labs) at the Arizona State University, also sharing research projects and the organization of scientific events.
More specifically, the group is active on the following research topics:
- Scalable techniques for tensor-based data analysis.
- Time series (possibly muti-variate) indexing, classification and querying algorithms.
- “Smart”technological solutions for large scale heterogeneous data management.
- Modeling users’ activity on social media,in collaboration RAI-CRIT (RAI Television Research Center), to leverage users’ social activity as an information source for television and radio programs recommendations systems.
- Definition of software instruments for improved accessibility to digital documents for visually impaired users.
Scalable social media analysis
MeSoOnTV: a media and social-driven ontology-based TV knowledge management system (Funded by RAI-CRIT)
Searching, browsing and analyzing web contents is today a challenging problem when compared to early Internet ages. This is due to the fact that web content is multimedial, social and dynamic. Moreover, concepts referred to by videos, news, comments, posts, are implicitly linked by the fact that people on the Web talk about something, somewhere at some time and these connections may change as the perception of users on the Web changes over time. The goal of the project is to define a model and develop a corresponding system for the integration of the heterogeneous and dynamic data coming from different knowledge sources (broadcasters' archives, online newspapers, blogs, web encyclopedias, social media platforms, social networks, etc.).
MIMOSA: Ontology-driven query system for the heterogeneous data of a SmArt City (Funded by the Compagnia di San Paolo, through the University of Torino)
“Mappe di Comunità 3.0” (Community Maps 3.0) is a participatory GIS prototype platform enabling communities of interests, governments and generic people to interact with multi-dimensional information spaces for retrieving and uploading geo-referenced information distributed in existing data sources as well as to discuss about the shared contents. MIMOSA aims at extending “Mappe di Comunità 3.0” with advanced information search facilities in order to enable people to interact with the 3D Community Map and to select the content to be visualized by using a clear and dynamic user interface. We will design and develop a proof of concept semantic engine that will exploit a multimodal domain ontology enabling the expression of the heterogeneous semantic contents related to the application domains, as well as the summaries of the multimedia data and external sources of information, to improve the users’ experience in browsing and searching data. A key component of the semantic engine will be a personalization module, which will combine the available integrated open data, the domain ontology and available private information characterizing users’ profiles and users’ interaction histories in order to manage a personalized shared information space.
RanKloud: Data Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis (Funded by NSF at ASU)
Today, multimedia data are produced and consumed in massive quantities in a broad range of applications with significant economic and societal benefit, including e-commerce, surveillance, education, web services, and social media. Hence, there is an urgent need for systems to provide highly scalable processing and efficient analysis of large media data collections. The RanKloud prototype system, developed in this research project, focuses on the needs and requirements of applications that deal with large quantities of multimedia data in a cloud-based scalable environment.
Modeling and analysis of the spread of infectious diseases
Data Management for Real-Time Data Driven Epidemic Spread Simulations (Funded by NSF at ASU)
The speed with which recent pandemics had immense global impact highlights the importance of realtime response and public health decision making, both at local and global levels. Existing software that enable model-driven epidemics and computer simulations for disease spreading help predict geo-temporal evolution of non-pharmaceutical control measures and interventions, relying on data and models including social contact networks, local and global mobility patterns of individuals, transmission and recovery rates, and outbreak conditions. If effectively leveraged, models reflecting past outbreaks, existing simulation traces obtained from simulation runs, and real-time observations incoming during an outbreak can be collectively used for obtaining a better understanding of the epidemic's characteristics and the underlying diffusion processes, forming and revising models, and performing exploratory, if-then type of hypothetical analyses of epidemic scenarios. We design and develop an epidemic simulation data management system (epiDMS) which addresses computational challenges that arise from the need to acquire, model, analyze, index, visualize, search, and recompose, in a scalable manner, large volumes of data that arise from observations and simulations during a disease outbreak.
Understanding the Evolution Patterns of the Ebola Outbreak in West-Africa and Supporting Real-Time Decision Making and Hypothesis Testing through Large Scale Simulations (Funded by NSF at ASU)
Global epidemic propagation occurs at multiple (local and global) scales: individuals within a subpopulation may be infected through local contacts during a local outbreak. These individuals then may carry the infection to a new region of the world, starting a new outbreak. Thus, disease spread simulations require data and models, including social contact networks, local and global mobility patterns of individuals, transmission and recovery rates, and outbreak conditions. Effectively managing the current emergency through real-time and continuous decision making requires computational models specifically tailored to the spatio-temporal dynamics of Ebola and data- and model-driven computer simulations for its spreading. Tools that help running and interpreting Ebola simulation ensembles (aligned with the real-world observations) to generate timely actionable results are critically needed. Given the urgency of this particular epidemic and the critical need for the development of the necessary models and tools specific to Ebola, this project focueses on Ebola transmission dynamics and control, specifically targeting products and processes for this Ebola epidemic. The research will result in novel algorithms and tools specially tailored for officials to continuously assess the impacts of different intervention scenarios and revise estimates based on real world data, at local and global scales, for the Ebola epidemic.
Data analysis in the context of energy building modelling.
E-SDMS: Energy Simulation Data Management System Software (Funded by NSF)
Existing building energy management systems (BEMSs) need to integrate large volumes of data, including (a) continuously collected heating, ventilation, and air conditioning (HVAC) sensor and actuation data, (b) other sensory data, such as occupancy, humidity, lighting levels, air speed and quality, (c) architectural, mechanical, and building automation system configuration data for these buildings, (d) local wheather and GIS data that provide contextual information, as well as (e) energy price, consumption, and cost data from electricity (such as smart grid) and gas utilities. We design and decelop the energy simulation data management system (e-SDMS) software, addressing challenges that arise from the need to model, index, search, visualize, and analyze, in a scalable manner, large volumes of multi-variate series resulting from observations and simulations of enenerfy data. e-SDMS will, therefore, fill an important hole in data-driven building design and clean-energy (an area of national priority) and will enable applications and services with significant economic and environmental impact.
Internet of Things
Capability Assurance for Smart Living (Funded by Intel corporation, started in September 14).
This Joint Path Finding (JPF) project's goal is to develop and mature the technologies needed to create and sustain a smart living environment. Specifically, this JPF will study, within one year, the feasibility of the Internet of Things (IOT) as a class of intelligent devices to enable the smart living environment. Intel, ASU, and DCU have joined as a team to carry out this JPF with two objectives: one is to conduct research in IOT related technologies to enable smart living, and the other is to mature technologies through proof-of-concept (POC) demonstrations and pilot deployments. The team will use ASU’s Sun Devil stadium renovation project as the usage scenario to focus the JPF. The anticipated benefit of the JPF is two-fold: to accelerate the deployment of IOT technologies at the Sun Devil stadium and to establish smart living laboratories at ASU and DCU with Intel’s participation and guidance.
TA_SL: Tecnologie Abilitanti per la Sicurezza sul Lavoro (Funded by Regione Piemonte, ended in december 2013)
The project's focus is on the design and development of an innovative integrated technological platform, including both hardware and software components, to support, automatize, rationalize and make efficient the management and assurance of risk prevention policies for workers, with special attention to prevention of risks faced by workers in the construction domain.
Assistive technologies and accessibility to educational material
Methodologies, technologies, materials e activities for accessible and inclusive learning of mathematics.
(Funded by CRT, coordinated by the dept. of Mathematics at the University of Torino.)
Maria Luisa Sapino
+ 39 011 6706745
Publications
Crowd Sourced Semantic Enrichment (CroSSE) for knowledge driven querying of digital resources
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2019
Matrix Factorization with Interval-Valued Data
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019
Robust Multi-Variate Temporal Features of Multi-Variate Time Series
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS, 2018
Smart ground project: A new approach to data accessibility and collection for raw materials and secondary raw materials in Europe
ENVIRONMENTAL ENGINEERING AND MANAGEMENT JOURNAL, 2017
Leveraging Cross-Domain Social Media Analytics to Understand TV Topics Popularity
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2016
Locality-Sensitive and Re-use Promoting Personalized PageRank Computations
KNOWLEDGE AND INFORMATION SYSTEMS, 2016
Reducing Seed Noise in Personalized PageRank
SOCIAL NETWORK ANALYSIS AND MINING, 2016
Recommending multimedia visiting paths in cultural heritage applications
MULTIMEDIA TOOLS AND APPLICATIONS, 2016
EpiDMS: Data Management and Analytics for Decision Making from Epidemic Spread Simulation Ensembles
THE JOURNAL OF INFECTIOUS DISEASES, 2016
Multiresolution Tensor Decompositions with Mode Hierarchies
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014
On Context-Aware Co-Clustering with Metadata Support
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012
PhC: Multiresolution Visualization and Exploration of Text Corpora with Parallel Hierarchical Coordinates.
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012
Narrative-based taxonomy distillation for effective indexing of text collections.
DATA & KNOWLEDGE ENGINEERING, 2012
Multimedia Recommendation and Delivery Strategies
Data Management in Pervasive Systems, 2015
CA-Smooth: Content Adaptive Smoothing of Time Series Leveraging Locally Salient Temporal Features
Proceedings of the 11th International Conference on Management of Digital EcoSystems (MEDES19),
11th International Conference on Management of Digital EcoSystems
2019
IMS-DTM: Incremental Multi-Scale Dynamic Topic Models
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence,
AAAI -2018
2018
Contextually-enriched querying of integrated data sources
Proceedings - IEEE 34th International Conference on Data Engineering Workshops, ICDEW 2018,
34th IEEE International Conference on Data Engineering Workshops, ICDEW 2018
2018
Context-Aware Proactive Personalization of Linear Audio Content
Advances in Database Technology — EDBT 2017,
EDBT: 20th International Conference on Extending Database Technology
2017
Personalized PageRank in Uncertain Graphs with Mutually Exclusive Edges
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval,
SIGIR'17 -The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
2017
nTD: Noise-Profile Adaptive Tensor Decomposition.
WWW '17 Proceedings of the 26th International Conference on World Wide Web,
WWW '17 - 26th International Conference on World Wide Web
2017
SIMDMS: Data Management and Analysis to Support Decision Making through Large Simulation Ensembles
Proceedings of the 20th International Conference on Extending Database Technology (EDBT '17),
20th International Conference on Extending Database Technology (EDBT '17)
2017
Tracking and analyzing the "second life" of TV content: A media and social-driven framework
CEUR Workshop Proceedings,
2nd International Workshop on Social Media World Sensors, SIDEWAYS 2016
2016
CrowdSourced semantic enrichment for participatory e-Government
Proceedings of the 8th International Conference on Management of Digital EcoSystems -MEDES 2016,
International Conference on Management of Digital EcoSystems, MEDES
2016
2PCP: Two-phase CP decomposition for billion-scale dense tensors
Proc. 32nd IEEE International Conference on Data Engineering, ICDE2016,,
32nd IEEE International Conference on Data Engineering, ICDE 2016
2016
PageRank Revisited: On the Relationship between Node Degrees and Node Significances in Different Applications
Proceedings of the Workshops of the EDBT/ICDT 2016 Joint Conference,
GraphQ: 5th international workshop on Querying Graph Structured Data (Satellite event at EDBT/ICDT 16)
2016
KSGM: Keynode-driven Scalable Graph Matching
CIKM15: Proceedings of the 24th ACM International Conference on Information and Knowledge Management,
The 24th ACM International Conference on Information and Knowledge Management
2015
Leveraging Audio Fingerprinting for Audio Content Synchronization and Replacement
Media Synchronization Workshop (MediaSync) 2015,
Media Synchronization Workshop (MediaSync) 2015 in conjunction with ACM TVX 2015
2015
Audio assisted group detection using smartphones
2015 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2015,
2015 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2015
2015
Leveraging metadata for identifying local, robust multi-variate temporal (RMT) features.
30th International Conference on Data Engineering (ICDE),
ICDE 2014
2014
"Can you really trust that seed?": Reducing the impact of seed noise in personalized PageRank.
Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM14),
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM14)
2014
Focusing Decomposition Accuracy by Personalizing Tensor Decomposition (PTD)
CIKM '14 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management,
23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM14)
2014
A Framework for Recommending Multimedia Cultural Visiting Paths
Proceedings of the 22nd Italian Symposium on Advanced Database Systems, SEBD 2014,
22nd Italian Symposium on Advanced Database Systems (SEBD 2014)
2014
MeSoOnTV: A Media and Social-driven Ontology-based TV Knowledge Management System
Proceedings of the 24th ACM Conference on Hypertext and Social Media - HT '13,
The 24th ACM Conference on Hypertext and Social Media - HT'13
2013
Tracking and analyzing TV content on the web through social and ontological knowledge
EuroITV '13 Proceedings of the 11th european conference on Interactive TV and video,
11th European Conference on Interactive TV and Video, EuroITV '13
2013
Hive open research network platform.
Proceedings of the 16th International Conference on Extending Database Technology (EDBT13),
EDBT2013
2013
LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation.
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM13),
CIKM'13
2013
Recommending Multimedia Objects in Cultural Heritage Applications
New Trends in Image Analysis and Processing, ICIAP 2013 Workshops,
2nd International Workshop on Multimedia for Cultural Heritage - MM4CH 2013
2013
STFMap: query- and feature-driven visualization of large time series data sets.
Proceedings of the 21st ACM international conference on Information and knowledge management(CIKM12),
CIKM 2012
2012
sDTW: Computing DTW Distances using Locally Relevant Constraints based on Salient Feature Alignments. (2012)
PROCEEDINGS OF THE VLDB ENDOWMENT,
38th International Conference on Very Large Databases
2012
R2DB: A System for Querying and Visualizing Weighted RDF Graphs
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012),
IEEE 28th International Conference on Data Engineering (ICDE 2012)
2012
Impact neighborhood indexing (INI) in diffusion graphs. CIKM 2012: 2184-2188
Proceedings of the 21st ACM international conference on Information and knowledge management,
CIKM 2012
2012