Accepted Paper

  • DIR – A Semantic Information Resource for Healthcare Datasets
    Jingyi Shi, Mingna Zheng and Yaorong Ge, University of North Carolina, USA
    The lack of an intelligent dataset information system that integrates diverse dataset related resources, and designed to address unique demands of target populations, is an important gap in health informatics and analytics development. To bridge the gap, DIR, a semantic Dataset Information Resource framework, has been developed to specifically address the learning challenges of entry-level students and researchers in identifying and understanding healthcare datasets for data analytics. The DIR framework leverages Semantic Web technologies and W3C dataset description standard in knowledge integration and representation in order to enable flexible and complex question answering. At this stage, the DIR prototype includes four major components—dataset metadata, search modules, frequently asked questions and intelligent answers, and blogs—and knowledge of three commonly used large and complex datasets. Initial results demonstrate clear value for health informatics students and researchers. Further development is underway.
  • Holistic Approach to Predicting Students’ Performance in Higher Educational Institutions - A Conceptual Framework
    Olugbenga Adejo and Thomas Connolly, University of the West of Scotland, United Kingdom
    Accurate prediction and early identification of student at-risk of attrition are of high concern for higher educational institutions (HEIs). It is of a great importance not only to the students but also to the educational administrators and the institutions in the areas of improving academic quality and efficient utilisation of the available resources for effective intervention. However, despite the different frameworks and models that various researchers have used across institutions for predicting performance, only negligible success has been recorded in terms of accuracy, efficiency and reduction of student attrition. This has been attributed to the inadequate and selective use of variables for the predictive models. This paper presents a multi-dimensional and holistic framework for predicting student academic performance and intervention in HEIs. The purpose and functionality of the framework are to produce a comprehensive, unbiased and efficient way of predicting student performance that its implementation is based upon multi-sources data and database system. The proposed approach will be generalizable and possibly give a prediction at a higher level of accuracy that educational administrators can rely on for providing timely intervention to students.
  • Mutual Information to Interpret the Semantics of Anomalies in Linkminng
    Zakea Il-agure and Belsam Attallah, Higher colleges of Technology, UAE
    This paper aims to show how mutual information can help provide a semantic interpretation of anomalies in data, characterize the anomalies, and how mutual information can help measure the information that object item X shares with another object item Y. Whilst most link mining approaches focus on predicting link type, link based object classification or object identification, this research focused on using link mining to detect anomalies and discovering links/objects among anomalies. This paper attempts to demonstrate the contribution of mutual information to interpret anomalies using a case study.
  • Optimization of RocksDB for Redis on Flash
    Keren Ouaknine, Oran Agra and Zvika Guz, Hebrew University of Jerusalem, Israel
    RocksDB is a popular key-value store, optimized for fast storage. With Solid-State Drives (SSDs) becoming prevalent, RocksDB gained widespread adoption and is now common in production settings. Specifically, various software stacks embed RocksDB as a storage engine to optimize access to block storage. Unfortunately, tuning RocksDB is a complex task, involving many parameters with different degrees of dependencies. As we show in this paper, a highly tuned configuration can improve performance by an order of magnitude over the baseline configuration. In this paper, we describe our experience optimizing RocksDB for Redis-on-Flash (RoF) – a commercial implementation of the Redis in-memory key-value store that uses SSDs as RAM extension to dramatically increase the effective per-node capacity. RoF stores hot values in RAM, and utilizes RocksDB to store and manage cold data on SSD drives. We describe our methodology for tuning RocksDB parameters and present our experiments and findings (including both positive and negative tuning results) on two clouds: EC2 and GCE. Overall, we show how tuning RocksDB improved the database replication time for RoF by more than 11x. We hope that our experience will help others adopt, configure, and tune RocksDB in order to realize its full performance potential.
  • A Model of Extracting Pattern in Social Network Data Using Topic Modelling, Sentiment Analysis and Graph Databases
    Assane Wade and Giovanna Di Marzo Serugendo, University of Geneva, Switzerland
    Social networks analysis studies the interactions between users in social media. The content of these social media are composed of structure of the network and the content (text, multimedia). When studying the mixed of these elements, specially text content, the challenge is to reduce the dimension of the text using several technics as topic modelling and sentiment analysis. The reduced text is then combine to the network structure for further task as predictive analytics, pattern recognition… In this paper we propose a method based on graph databases, topic modelling and sentiment analysis to facilitate pattern extraction within social media texts. We apply our model to a Twitter dataset to extract opinion patterns inside it. Our model performed very well and we could extract those opinion patterns.
  • A Comparison on Data Model and Performance in NOSQL Database and RDBMS
    Surabhi Dwivedi and Kumari Roshni VS, Centre for Development of Advanced Computing Bangalore, India
    In this paper we have performed a comparative study between one of the document oriented NoSQL database, MongoDB and one of the relational database management system (RDBMS), PostgreSQL. We have used a scenario to illustrate the data models of RDBMS and MongoDB along with embedding and referencing options available in MongoDB. We have presented various query operations and analyzed query performance for read, update and delete operations for both the databases. The backup, replication and map reduce operations of the two databases have also been studied. We found that MongoDB is more efficient to handle read intensive, horizontally scalable datasets and it is more flexible to accommodate any rapid changes. PostgreSQL is suitable when tight integrity is required among the datasets and frequent changes are not required to the database. MongoDB is also more suitable to process huge amount of data distributed across multiple clusters.