Show simple item record

dc.contributor.authorLoving, Joshuaen_US
dc.date.accessioned2018-02-23T16:06:51Z
dc.date.available2018-02-23T16:06:51Z
dc.date.issued2017
dc.identifier.urihttps://hdl.handle.net/2144/27172
dc.description.abstractHigh-throughput next-generation sequencing techniques have hugely decreased the cost and increased the speed of sequencing, resulting in an explosion of sequencing data. This motivates the development of high-efficiency sequence alignment algorithms. In this thesis, I present multiple bit-parallel and Single Instruction Multiple Data (SIMD) algorithms that greatly accelerate the processing of biological sequences. The first chapter describes the BitPAl bit-parallel algorithms for global alignment with general integer scoring, which assigns integer weights for match, mismatch, and insertion/deletion. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations. Bit-parallelism has previously been applied to other pattern matching problems, producing fast algorithms. In timed tests, we show that BitPAl runs 7 - 25 times faster than a standard iterative algorithm. The second part involves two approaches to alignment with substitution scoring, which assigns a potentially different substitution weight to every pair of alphabet characters, better representing the relative rates of different mutations. The first approach extends the existing BitPAl method. The second approach is a new SIMD algorithm that uses partial sums of adjacent score differences. I present a simple partial sum method as well as one that uses parallel scan for additional acceleration. Results demonstrate that these algorithms are significantly faster than existing SIMD dynamic programming algorithms. Finally, I describe two extensions to the partial sums algorithm. The first adds support for affine gap penalty scoring. Affine gap scoring represents the biological likelihood that it is more likely for gaps to be continuous than to be distributed throughout a region by introducing a gap opening penalty and a gap extension penalty. The second extension is an algorithm that uses the partial sums method to calculate the tandem alignment of a pattern against a text sequence using a single pattern copy. Next generation sequencing data provides a wealth of information to researchers. Extracting that information in a timely manner increases the utility and practicality of sequence analysis algorithms. This thesis presents a family of algorithms which provide alignment scores in less time than previous algorithms.en_US
dc.language.isoen_US
dc.subjectBioinformaticsen_US
dc.subjectBit-parallelen_US
dc.subjectNext generation sequencing analysisen_US
dc.subjectPattern matchingen_US
dc.subjectSequence alignmenten_US
dc.subjectSIMDen_US
dc.titleBit-parallel and SIMD alignment algorithms for biological sequence analysisen_US
dc.typeThesis/Dissertationen_US
dc.date.updated2017-11-21T17:12:13Z
etd.degree.nameDoctor of Philosophyen_US
etd.degree.leveldoctoralen_US
etd.degree.disciplineBioinformatics GRSen_US
etd.degree.grantorBoston Universityen_US


This item appears in the following Collection(s)

Show simple item record