Gene mapping includes methods to identify positions of a gene on a chromosome and physical
distances between genes. Genome assembly refers to the process of taking a large number of short DNA
sequences and reassembling them to create a representation of the original chromosomes from which the
DNA originated. Technologies to map the entire genome of an organism have been developed in recent
years. These findings provide scaffolds for structural variation analysis and mutation frequency detection.
Structural variations (SVs) are defined as regions of DNA approximately 50bp and larger in size and can
include genomic inversions, duplications, insertions, translocations, and deletions. Identifying SVs is
important for genome interpretation and determining the relationship to disease. Several short and long-read
DNA sequencing methods have been developed. These methods are aimed at determining the nucleic acid
sequence in DNA. Optical mapping is a method to linearize single DNA strands in nanochannels and
construct DNA maps by imaging fluorescently labeled sequence motifs in single DNA molecules.
Optical maps are advantageous over DNA sequencing as they can easily span difficult to read genomic regions
for large-scale SV detection.
Dr. Xiao’s lab has developed a new sequence-specific DNA labeling approach to complement the traditional enzyme motif-based strategy in optical mapping. This two-color mapping allows targeting the breakpoints of genome structural variations in repetitive regions which are hard to detect by all other methods. The task at hand is to process the molecular data from an optical mapping of two-color labeling in repetitive regions. This data is generated by the optical mapping, which is to assemble single DNA molecules to a reference. SVs are detected by identifying outlier molecule alignments. I will explore different detection strategies and statistical approaches based on the analysis need and the research hypothesis, in hopes of developing a better pipeline for the identification of the second color label signal in specific regions. This task will not only involve understanding algorithms previously formulated by the lab but will also require a solid understanding of the mapping experiments conducted to obtain the data. The biological structure and function of the target gene region will also need to be thoroughly studied to understand the relationship of SVs in that region to gene function or regulations. The findings from these studies have the potential to support pioneering research in cell and gene therapy development.
I have previous experience with compiling alignment algorithms to process sequencing data through a project in Dr. Xiao’s Computational Bioengineering class. This project was aimed at aligning sequencing data to a reference sequence to output parameters like the number of aligned bases, the length distribution of aligned sequence reads, and the mapping quality score. Through my other biocomputational classes, I have obtained excellent proficiency with MATLAB programming and its application in a broad range of biomedical fields. My co-ops have allowed me to build a robust skillset in cell and molecular research and manufacturing processes. Above all, I am eager to push myself to new limits by learning new concepts and familiarizing myself with the latest technologies in the field. I will be able to make invaluable connections with my lab members for guidance on future pursuits. This experience will provide me with the training and background I need to achieve my professional goals in a challenging and competitive industry.
Abstract:
Image analysis pipeline for DNA linearization methods
Zeal Jinwala, Dharma Varapula , Ming Xiao
Department of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia 19104
Gene mapping includes methods to identify positions of a gene on a chromosome and physical distances between genes. Genome assembly refers to the process of taking short DNA sequences and reassembling them to create a representation of the original chromosomes from which the DNA originated [1]. Structural variations (SVs) are defined as regions of DNA approximately 50bp and larger in size and can include genomic inversions, duplications, insertions, translocations, and deletions [2]. Optical mapping is a method to linearize single DNA strands in nanochannels and constructing DNA maps by imaging fluorescently labelled sequence motifs in single DNA molecules. Mapping of linearized DNA molecules is useful in sequence assembly, large structural variant detection, and diagnostics [3]. The goal of this study is to develop a pipeline for batch processing and digitization of DNA linearization images for the identification of labels, and inter-label distances to provide the basis for DNA mapping.
References
[1] Lam, Ernest T, et al. “Genome Mapping on Nanochannel Arrays for Structural Variation Analysis and Sequence Assembly.” Nature Biotechnology, vol. 30, no. 8, 2012, pp. 771–776., doi:10.1038/nbt.2303.
[2] Ho, Steve S., et al. “Structural Variation in the Sequencing Era.” Nature Reviews Genetics, vol. 21, no. 3, 2019, pp. 171–189., doi:10.1038/s41576-019-0180-9.
[3] Varapula, D.; LaBouff, E.; Raseley, K.; Uppuluri, L.; Ehrlich, G. D.; Noh, M.; Xiao, M. A Micropatterned Substrate for on-Surface Enzymatic Labelling of Linearized Long DNA Molecules. Scientific Reports 2019, 9.
Dr. Xiao’s lab has developed a new sequence-specific DNA labeling approach to complement the traditional enzyme motif-based strategy in optical mapping. This two-color mapping allows targeting the breakpoints of genome structural variations in repetitive regions which are hard to detect by all other methods. The task at hand is to process the molecular data from an optical mapping of two-color labeling in repetitive regions. This data is generated by the optical mapping, which is to assemble single DNA molecules to a reference. SVs are detected by identifying outlier molecule alignments. I will explore different detection strategies and statistical approaches based on the analysis need and the research hypothesis, in hopes of developing a better pipeline for the identification of the second color label signal in specific regions. This task will not only involve understanding algorithms previously formulated by the lab but will also require a solid understanding of the mapping experiments conducted to obtain the data. The biological structure and function of the target gene region will also need to be thoroughly studied to understand the relationship of SVs in that region to gene function or regulations. The findings from these studies have the potential to support pioneering research in cell and gene therapy development.
I have previous experience with compiling alignment algorithms to process sequencing data through a project in Dr. Xiao’s Computational Bioengineering class. This project was aimed at aligning sequencing data to a reference sequence to output parameters like the number of aligned bases, the length distribution of aligned sequence reads, and the mapping quality score. Through my other biocomputational classes, I have obtained excellent proficiency with MATLAB programming and its application in a broad range of biomedical fields. My co-ops have allowed me to build a robust skillset in cell and molecular research and manufacturing processes. Above all, I am eager to push myself to new limits by learning new concepts and familiarizing myself with the latest technologies in the field. I will be able to make invaluable connections with my lab members for guidance on future pursuits. This experience will provide me with the training and background I need to achieve my professional goals in a challenging and competitive industry.
Abstract:
Image analysis pipeline for DNA linearization methods
Zeal Jinwala, Dharma Varapula , Ming Xiao
Department of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia 19104
Gene mapping includes methods to identify positions of a gene on a chromosome and physical distances between genes. Genome assembly refers to the process of taking short DNA sequences and reassembling them to create a representation of the original chromosomes from which the DNA originated [1]. Structural variations (SVs) are defined as regions of DNA approximately 50bp and larger in size and can include genomic inversions, duplications, insertions, translocations, and deletions [2]. Optical mapping is a method to linearize single DNA strands in nanochannels and constructing DNA maps by imaging fluorescently labelled sequence motifs in single DNA molecules. Mapping of linearized DNA molecules is useful in sequence assembly, large structural variant detection, and diagnostics [3]. The goal of this study is to develop a pipeline for batch processing and digitization of DNA linearization images for the identification of labels, and inter-label distances to provide the basis for DNA mapping.
References
[1] Lam, Ernest T, et al. “Genome Mapping on Nanochannel Arrays for Structural Variation Analysis and Sequence Assembly.” Nature Biotechnology, vol. 30, no. 8, 2012, pp. 771–776., doi:10.1038/nbt.2303.
[2] Ho, Steve S., et al. “Structural Variation in the Sequencing Era.” Nature Reviews Genetics, vol. 21, no. 3, 2019, pp. 171–189., doi:10.1038/s41576-019-0180-9.
[3] Varapula, D.; LaBouff, E.; Raseley, K.; Uppuluri, L.; Ehrlich, G. D.; Noh, M.; Xiao, M. A Micropatterned Substrate for on-Surface Enzymatic Labelling of Linearized Long DNA Molecules. Scientific Reports 2019, 9.


