REU Proposal 2019

SURREAL (Site for UndeRgraduate REsearch in mAchine Learning)

Faculty Mentors

Students in our REU program will have the opportunity to work with the following REU mentors. A list of these mentors and possible research projects are described below.  

Gurman GillAssistant Professor of Computer Science

Motivation: Chest computed tomography (CT) scans are widely used for automatic detection and classification of pulmonary diseases. CT scans are three-dimensional (3D) image datasets that capture properties of tissues and organs in the body. Manual interpretation of a large number of these scans by radiologists is time-consuming and can be error-prone, especially when healthcare professionals are carrying a heavy workload. For this reason, there is considerable interest in developing computer-aided diagnostic (CAD) systems that can screen and/or detect pulmonary diseases automatically, efficiently and with reduced risk of detection errors, which in turn can help radiologists optimize their diagnostic decisions.

Research Goals: The goal of this project is to develop a CAD technique for automatically classifying whether lungs are normal or infected by Interstitial lung disease (ILD). ILD represents a group of more than 150 disorders of the lung parenchyma. These infections typically manifest themselves as textured patterns in CT scans. Machine learning techniques can be used to learn and distinguish between these patterns belonging to different diseases. This project will investigate the efficacy of employing Convolutional Neural Network (CNN) in the following ways: i) Using “Transfer Learning” approach on input data of different dimensionality: CT image patches (2D) or entire CT image slices (2D) or combination of neighboring CT image slices (2.5D). ii) Designing and implementing a CNN “from scratch” to capture the low-level textural features of the lung tissue.

Matty Mookerjee Professor of Geology

Motivation: Within the field-based geosciences, it has become necessary to build and incorporate cyberinfrastructure (CI) technologies to replace some of the analog methodologies (e.g., notebook, pen, and transit compass) used for recording observations, collecting samples, and taking field measurements. These CI-systems allow data to live on beyond any single investigator and breaks down the artificial barriers between subfields within the Earth sciences. To incentivize the participation and contribution to the development of this cyberinfrastructure, an analytical machine learning based environment needs to be built that allows the user to aggregate, visualize, manipulate, and analyze data from connected data repositories. In order to incentivize participation in populating these databases, the system will “reward” the data providers with a list of links to imagery within the data management system that have similar features as their own. In this way, potential collaborations can be identified that span across the typical boundaries that divide our various scholarly communities (e.g., scale, geography of the field area, tectonic regime, instrumentation, etc.).

Research Goals: The goal of this project is to utilize machine learning techniques to automatically extract and label patterns from photomicrographs. Two specific applications will be investigated: i) Classify photomicrographs as containing asymmetric, shear-sense-indicating clasts (i.e., sigma or delta clasts and mica fish) or not, and ii) Automatically make correlations between experimentally deformed rocks with similar naturally formed microstructures. In addition to internet searches, we will be mining images from the burgeoning microstructural image repository currently being developed through the NSF-funded project, “EarthCube Data Infrastructure: Collaborative Proposal: A unified experimental-natural digital data system for cataloging and analysis of rock microstructures.”


Matthew Clark Professor of Geography

Motivation: Research using sounds to monitor individual species spans decades. However, using sound to monitor biodiversity across landscapes is a very recent development. The availability of inexpensive, autonomous sound recording units (ARUs) that detect sounds with sufficient quality has made possible the scaling of research from single locations and organisms to full animal communities across landscapes. ARUs facilitate the survey of large areas for long periods of time, and in doing so generate large amounts of recordings. New developments in our capacity to rapidly process sound data with ever-faster computing systems and complex machine-learning algorithms now permit the detection of species at a site with reasonable accuracy. Beyond single species, the new field of soundscape ecology seeks to monitor biodiversity through the analysis of recorded sounds over time.

Research Goals: The goal of students in this project will be to use machine learning to automatically classify soundscapes with their respective components of biophony (e.g., animal noises) anthrophony (e.g., cars, airplanes, conversations) and geophony (e.g., wind, rain, moving water). Data will include over 350,000 minutes of recordings from a variety of sites throughout Sonoma County, as recorded by the Soundscapes to Landscapes (S2L) project (soundscapes2landscapes.org). Classified soundscape data will then inform S2L’s investigation of how anthropogenic and geophonic noise influences the strength of acoustic indices in predicting avian diversity.

 

Lisa Bentley Assistant Professor of Biology

Motivation: Terrestrial Laser Scanning (TLS) has generated a revolution in how we look at forests, as it can measure 3D vegetation structure to potentially millimeter accuracy and precision at plot scales (e.g., 100 - 1000 m2), and potentially several kilometers (Disney 2018). While TLS technology was introduced 60 years ago in the entertainment industry, foresters and ecologists have been slow to adopt it due to the cost associated with the instrument and time associated with data analyses. Currently, forest structure and tree parameters are estimated by combining TLS data with a 3D quantitative structure model (TreeQSM). TreeQSM reconstructs the whole tree topological structure by fitting cylinders to each segment but it constrains the underlying representation of architectural complexity. Often, a simple cylinder-fractal structure of trees is enforced, thus losing details of the complex nature of the architecture of trees.

Research Goals: The first goal of this project is to develop or refine algorithms to automatically extract information from co-registered scans laser returns related to coarse woody debris on the forest floor as well as only returns originating from tree stems. This work will build upon the Iterative Closest Point (ICP) algorithm in which all returns close to the surface and originating from leafy or fine vegetation will be manually removed, leaving only returns originating from tree stems. The ultimate goal is to develop a machine learning process that will surpass the currently used an accepted manual process to achieve this dataset in CloudCompare software. The second goal of this project is to develop or refine new QSMs to extract quantitative data related to forest structure with little error for small branches.


Ravikumar Balasubramanian Professor of Computer Science

Motivation: Electroencephalography (EEG) is a measurement of potentials that reflect the electrical activity of the human brain. It is a noninvasive signal that can be observed from the scalp using probes placed in selective locations known for strong signals. The EEG is widely used by physicians and scientists to study brain functions and to diagnose neurological disorders. The study of the brain’s electrical activity, through the EEG records, is one of the most important tools for the diagnoses of neurological diseases, such as epilepsy, brain tumours, head injury, sleep disorders, dementia and monitoring the depth of anaesthesia during surgery.

Research Goals:  We propose to include projects to span a wide spectrum involving EEG signals: generation of EEG signals, removing noise, extracting features, building classifiers and applying the classifier for 2 potential BCI applications. i) Build a hands-free type-setting tool: The basic premise of this tool is that EEG signal generated when the user wants to select a character is shown on the screen can be distinguished from the one she does not want. ii) Classification of audio stimulus from EEG recording: The overall goal of the project is to distinguish between the EEG patterns while the subject hears a segment of speech, music and no audio. In this project, the subject will be provided an audio stimulus consisting of speech, music or no audio input of a short duration. EEG signal will be recorded in each case. The goal is to build a classifier that distinguishes between these three stimuli.