Once considered relatively rare, dengue fever is popping up throughout the globe, including the United States. The mosquito-borne virus is having a particularly active year, which some public health officials attribute, at least partially, to a warming climate.
Transmitted to humans through the bite of the female Aedes aegypti mosquito, dengue causes fever, vomiting, headache, muscle and joint pain, as well as skin rashes. Most people infected with the virus recover, but the disease can escalate into lethal complications. And, curiously, while people who’ve recovered from the virus develop immunity to the strain that infected them, they often become more susceptible to infection by different strains of the virus.
Utah State University data scientist Kevin Moon is among a group of researchers, led by Yale University, that’s recently completed a large-scale study of the virus using single-cell data from biological samples collected from infected people in India. Supported by the National Institutes of Health and the Indo-U.S. Vaccine Action Program, the research team includes scientists from India’s National Institute of Mental Health and Neurosciences. The group published findings in the Oct. 7 issue of Nature Methods.
“My role in this project included contributing to the development of ‘SAUCIE,’ a data analysis method designed to tackle very large datasets, such as the one collected for this study,” says Moon, assistant professor in USU’s Department of Mathematics and Statistics, who specializes in data science and machine learning. “The team applied SAUCIE to a 20 million-cell mass cytometry dataset with genetic and molecular information from 180 samples collected from 40 subjects.”
SAUCIE, which stands for “Sparse Autoencoder for Unsupervised Clustering, Imputation and Embedding,” is a multi-layered deep neural network, which allows researchers to extract detailed information from large quantities of single cells.
“Collecting useful data for this kind of application requires getting information from very large samples of individual cells,” Moon says. “Without a large set, you can’t collect a good representation of the many types of cells, including rare cells.”
But developing computational tools to handle so much information is a challenge.
“That’s where neural networks, like SAUCIE, come in,” Moon says. “Neural networks, constructed from a set of algorithms and modeled loosely after the human brain, are designed to recognize patterns in the data.”
SAUCIE, he says, offers four main capabilities.
“First of all, it clusters data into similar groups which, in this case, allowed the researchers to segment cells into similar groups and ferret out rare cell populations,” Moon says. “Secondly, SAUCIE is good at ‘de-noising data.”
That is, SAUCIE refines data, eliminating distracting information.
A third feature of SAUCIE is batch correction, he says, that eliminates non-biological effects caused by variations in sample collection and analysis.
Finally, SAUCIE enables data visualization.
“This is a powerful analysis tool that allow researchers to visually explore patterns in the data,” Moon says.
Having the ability to explore the cell data at this level will help researchers better understand the basic biology of how cells respond to the dengue virus from initial infection to the disease’s progression.
“The hope is this information will lead to preventive efforts and therapies for those infected,” Moon says.