Elise de Doncker Research Highlights

Biosketch

Elise de Doncker is a professor of Computer Science at WMU. She obtained her doctoral degree in mathematics at the Katholieke Universiteit Leuven (KUL), Belgium. Her research revolves around algorithm design and analysis, particularly in scientific computing and numerical integration, and targeted toward high performance computations and architectures. To date (7/24/20), the Quadpack integration package, developed when she was a student (with R. Piessens, D.K. Kahaner and C.W. Uberhuber) has 1,356,922 downloads from netlib.org, and the Quadpack book was cited 1218 times according to Google Scholar. She also pioneered the ParInt package for multivariate integration (with A. Gupta, A. Genz and R. Zanny) for cluster and distributed integration. Perhaps in Renaissance style she later branched out to other fields including neural networks, agent-based simulations (of epidemics/pandemics), sentiment analysis, human behavior modeling, and bioinformatics.

She is the Principal Investigator on seven awards from the National Science Foundation. One of the NSF grants funded the "thor" supercomputer cluster, housed in the CEAS Center for High Performance Computing and Big Data (HPCBD), of which she is a Director. She has been sponsored by the "Large Scale Computational Science with Heterogeneous Many-Core Computer" project for MEXT (Ministry of Education, Culture, Sports, Science and Technology) in Japan, where she has a long-standing collaboration with the group of F. Yuasa at the High Energy Accelerator Research Organization in Tsukuba (KEK), on computations for loop diagrams in high energy physics. For this work she secured access to the PEZY Exascalar supercomputers at KEK. Since 2011 she organized and chaired the Workshop on Large Scale Computational Physics (LSCP), which evolved into the Workshop on Large Scale Computational Science (LSCS) held in 2020 (with F. Yuasa and H. Matsufuru). Apart from stays in Japan, earlier on she also spent research stays, sabbatical and other leaves at Stanford University, Argonne National Laboratory, Caltech, and the Delft University of Technology in the Netherlands.

She has authored or co-authored around 150 publications, many of which with students. In the past year or so, five of her PhD students graduated, A. Almulihi (Summer I 2019), H. Al-Shaikhli (Summer I 2019), O. Olagbemi (Fall 2019), A. Alharbi (Fall 2019), and M. Alrizq (Summer I 2020), and she recently started working with two new PhD students, T. Gharaibeh and J. Rhodes. A synopsis of these students' projects is given below, with some selected publications. She further has publications with MS theses students on topics such as parallel simulations of agent-based pandemic modeling, Monte Carlo simulations on Intel Xeon Phi, and GPU integral computations in stochastic geometry. She is currently working with an undergraduate student on forecasting for COVID-19.

 

 

Research with Student Collaborations

Recently Graduated

 

Ahmed Sulaiman M. Alharbi 

 
Social Media Sentiment Analysis with a Deep Neural Network: An Enhanced Approach Using User Behavioral Information

Social media sentiment analysis constitutes a fundamental problem with many interesting applications, such as in Business Intelligence, Medical Monitoring, and National Security. Most current social media sentiment classification methods judge the sentiment polarity primarily according to textual content and neglect other information on these platforms. We propose deep learning-based frameworks that also incorporate user behavioral information within a given document (tweet). These frameworks comprise several models based on a variety of neural network architectures. Each of these models is trained on a specific aspect of user behavior. Then, the frameworks exploit these multi-aspect learning models to jointly take on the mutual sentiment analysis task. The experimental results demonstrate that going beyond the content of a document is beneficial in sentiment classification, because it provides the classifier with a deeper understanding of the task.     

Selected publications:
  • Twitter Sentiment Analysis with a Deep Neural Network: An Enhanced Approach using User Behavioral Information, A S M Alharbi and E de Doncker, Cognitive Systems Research 54 (2019), pp. 50-61, https://doi.org/10.1016/j.cogsys.2018.10.001  
  • Emotional Awareness based Classification Model for Twitter Sentiment Analysis using a Deep Neural Network, A S M Alharbi and E de Doncker, in 21st International Conference on Artificial Intelligence (ICAI’19), Proc. CSCE 2019, pp. 142-145, ISBN 1-60132-501-0.

 

Omofokolakumi E. Olagbemi

 
Scalable Algorithms and Hybrid Parallelization Strategies for Multivariate Integration with ParAdapt and CUDA

The numerical evaluation of integrals finds applications in fields such as High Energy Physics, Bayesian Statistics, Stochastic Geometry, Molecular Modeling and Medical Physics. Erratic integrand behavior due to singularities, peaks or ridges in the integration region require reliable algorithms and software that not only provide an estimation of the integral with a level of accuracy acceptable to the user, but also perform this task in a timely manner. We developed ParAdapt, a numerical integration software based on an adaptive task partitioning strategy, which maps tasks (regions) for evaluation on GPUs (Graphical Processing Units). The resulting methods render the classic framework of the global adaptive scheme suitable in moderate dimensions, say 10 to 25. The new algorithms are scalable as evidenced by speedup values in the double and triple digits up to very large numbers of subdivisions. An analysis of the various partitioning and parallelization strategies is given.

Selected publications:

 

Hasnaa Imad Al-Shaikhli

 
Approximate Algorithms for Motif Discovery in DNA

Motif discovery is the problem of finding common substrings in a set of biological strings. Therefore it can be applied to finding Transcription Factor Binding Sites (TFBS) that have common patterns (motifs). A transcription factor molecule can bind to multiple binding sites in the promoter region of different genes to make these genes co-regulating. The Planted (l,d) Motif Problem (PMP) is a classic version of motif discovery where l is the motif length and d represents the maximum allowed mutation distance. The quorum Planted (l, d, q) Motif Problem (qPMP) is a version of PMP where the motif of length l occurs in at least q percent of the sequences with up to d mismatches. This work develops the Strong Motif Finder (SMF) and quorum Strong Motif Finder (qSMF) algorithms and evaluates their performance as compared to established algorithms. SMF outperforms APMotif in speed and prediction accuracy, and performs at the level of the best prediction accuracy of MEME (with OOPS choice), notwithstanding that SMF is not given a-priori information.

Selected publications:
  • qSMF: An Approximate Algorithm for Quorum Planted Motif Search on ChiP-Seq Data, H Al-Shaikhli and E de Doncker, 2019 IEEE International Conference on Electro/Information Technology, IEEE Xplore, DOI:10.1109/EIT.2019.8834006, https://ieeexplore.ieee.org/document/8834006   
  • SMF: Approximate Algorithm for the Planted (l, d) Motif Finding Problem in DNA Sequences, H Al-Shaikhli and E de Doncker, Conf. BIOCOMP (Bioinformatics and Computational Biology), pp. 123-129 (2018), ISBN: 1-60132-471-5, https://csce.ucmss.com/cr/books/2018/LFS/CSREA2018/BIC4274.pdf

 

Ahmed Hassan H. Almulihi

 
High-performance Quasi-Monte Carlo integration and applications

Affected by the "curse of dimensionality", performing high-dimensional integration efficiently is a major problem in scientific computations, and is mostly unsolved without a-priori information on the function behavior. Important application areas include high-energy physics, Bayesian statistics, computational finance, and stochastic geometry applied in robotics, tessellations and imaging from medical data. A major part of this study is the development of high-performance lattice based algorithms for approximating moderate- to high-dimensional integrals on GPU accelerators. We show that rank-1 as well as embedded (composite) lattice rules on GPUs, possibly with an integral transformation to alleviate the effects of boundary singularities, yield better accuracy and efficiency for various classes of integrals compared to classic Monte Carlo and adaptive methods. We further parallelized the component-by-component (CBC) construction by Nuyens and Cools (2006) for large rank-1 lattice rules using the CUDA (cuFFT) Fast Fourier Transform procedure.

Selected publications:

 

Mesfer Alrizq

 
Changing energy consumption patterns based on multi-agent human behavior modeling for analyzing the effects of feedback techniques

With the deployment of smart grid technologies and Advanced Metering Infrastructure, demand side management via feedback is of interest to utility companies and researchers for modeling consumer behavior. Conventional methods of collecting load profile (energy consumption) data, such as surveys and metering, are time-consuming and pose barriers in view of large consumer demand and continuous technological progress, which render the data obsolete in a short time. In response, our work derives an innovative behavior model for energy consumption load profiles. Based on fuzzy logic and activity graphs, the model requires minimum consumer data and can be easily adapted to changes in technology. We demonstrate the accuracy of the model against real world data. For analyzing behavior change we propose a multi-agent-based system to study the effects of feedback for different types of consumers. Consumer categories are generated based on their behavioral responses to given feedback. The feedback methods that are more effective for each category are evaluated and identified.

Selected publications:
  • Changing Energy Consumption Patterns  Based on Multi-Agent Human Behavior Modeling for Analyzing the Effects of Feedback Techniques, M Alrizq, E de Doncker and A Fong, In Power and Energy Conference Illinois (PECI), 2019 IEEE, pp. 1-8, https://ieeexplore.ieee.org/document/8698779
  • A Novel Fuzzy Based Human Behavior Model for Residential Electricity Consumption Forecasting, M Alrizq and E de Doncker, 2018 Power and Energy Conference in Illinois (PECI), 2018 IEEE, pp. 1-7, https://ieeexplore.ieee.org/document/8334984

 

New Students

 

Tasnim Gharaibah

 
Unsupervised learning with word embeddings

We employ a class of unsupervised deep learning algorithms (Word2vec) to produce word embeddings from a corpus of articles that appeared in the literature. The Word2vec models are shallow, two-layer neural networks that are trained to learn relationships between words. According to the work by Vahe Tshitoyan et al., Nature Vol. 571 (2019) on "Unsupervised word embeddings capture latent knowledge from materials science literature," this latent knowledge contains structure-property relationships and can lead to a-priori recommendations of materials for functional applications. In collaboration with Dr. P. Ari-Gur (MAE, WMU), we generated various models to support analysis and prediction of new materials. We intend to apply the method to diverse research topics as a proof of concept.

 

James Rhodes

 
New priority queue data structures for task partitioning algorithms

Task partitioning methods support algorithms maintaining task pools such as some branch-and-bound methods, and adaptive numerical integration. Adaptive region partitioning is applied to produce fine meshes delineating areas of lighting and sharp shadows for image processing such as in ray tracing and radiosity methods, and likewise for intensive partitioning in the vicinity of difficult function behavior for adaptive integration. We focus on global adaptive integration methods where a set of important (high error) regions is selected, subdivided and their children regions processed at each step. Previous methods used a heap or linked list priority queue and deleted one region at a time, which is inefficient as the heap may get very large. We are testing and analyzing bucket-based data structures, with the goal of improving efficiency without severely affecting the accuracy by selecting only moderately important regions at each step.