Deep Learning Filter Improves Accuracy of Cellular Mutation Detection and Accuracy of Cancer Diagnoses
Next-generation cancer control strategies rely on next-generation gene sequencing (NGS), which opens the door to new techniques and tools to detect mutations and determine treatment for patients. A team of Chinese researchers has proposed a more effective strategy to filter out false positive results, which improves the accuracy and efficiency of cancer diagnosis and treatment.
The research team proposed DeepFilter, a deep learning-based filter to remove false positives in somatic variants from NGS data.
Their study was published on January 6, 2023 in Tsinghua Science and Technology.
The discovery of somatic mutations or alterations in normal tissues is essential for understanding fatal genetic diseases of the human genome such as cancer. Next-generation gene sequencing accelerates the search for somatic mutations by using technologies that split DNA/RNA into multiple pieces and identify sequences in parallel, producing thousands or millions of sequences simultaneously. This technique improves accuracy while reducing sequencing cost and time.
Powerful “calling tools” sift through NGS data and locate tumors or other mutations by comparing the sequences to a reference genome from related tissues in the same individual.
VarDict is a commonly used somatic variant calling tool in clinical research. Previous studies have shown that VarDict achieves higher accuracy rates and detects more true variants than similar calling tools. However, VarDict also generates a higher number of false positives than other callers, which can skew the results.
“An error rate of 1:10,000 in a genome with 3 billion positions would lead to many false calls, which could lead to inaccurate clinical diagnoses,” said Zekun Yin, study author from the University. from Shandong. “However, filtering out true positives can also lead to missed diagnoses.”
Typically, researchers manually filter out some of the false positives, a cumbersome and expensive process the Chinese research team set out to mitigate.
“It will save a lot of time and money if we provide an automatic method to effectively filter out most false positives,” said Hao Zhang, author of the study from Shandong University.
Inspired by recent successes integrating machine learning-based methods to call genetic variants from NGS data, the Chinese research team introduced a deep learning-based variant filter. Called DeepFilter, the filter is designed to effectively filter out the false positive variants generated by VarDict while ensuring high call sensitivity.
DeepFilter treats the task of distinguishing whether a variant is true or false as a binary classification problem. The researchers used three types of data sets to train and test DeepFilter: normal real-world tumor sample data, a mixture of two reference data, and synthetic data.
Experimental results based on synthetic and real NGS data were promising:
“DeepFilter outperformed other filters in terms of false positive variant filtering tasks, which made VarDict more valuable in practical clinical research and greatly facilitated downstream analysis in biological research and patient treatment,” Zhang said.
The team plans to dig deeper into the problem of filtering out false positive variants, looking specifically at the problem of imbalance of positive and negative samples and incorporating other machine learning and deep learning methods for filtering.
“Our ultimate goal is to solve the problem of variation call efficiency and accuracy and provide a state-of-the-art variation detection tool,” Yin said.
Hao Zhang et al, DeepFilter: A deep learning-based variant filter for VarDict, Tsinghua Science and Technology (2023). DOI: 10.26599/TST.2022.9010032
Provided by Tsinghua University Press
Quote: Deep Learning Filter Improves Accuracy of Cellular Mutation Detection and Accuracy of Cancer Diagnoses (February 1, 2023) Retrieved February 1, 2023 from https://medicalxpress.com/news/2023-02-deep -learning-filter-precision-cell-mutation.html
This document is subject to copyright. Except for fair use for purposes of private study or research, no part may be reproduced without written permission. The content is provided for information only.