Show simple item record

dc.contributor.authorPapapetrou, Panagiotisen_US
dc.contributor.authorBenson, Garyen_US
dc.contributor.authorKollios, Georgeen_US
dc.date.accessioned2011-10-20T05:24:30Z
dc.date.available2011-10-20T05:24:30Z
dc.date.issued2006-10-15
dc.identifier.urihttps://hdl.handle.net/2144/1887
dc.description.abstractThe problem of discovering frequent arrangements of regions of high occurrence of one or more items of a given alphabet in a sequence is studied, and two efficient approaches are proposed to solve it. The first approach is entropy-based and uses an existing recursive segmentation technique to split the input sequence into a set of homogeneous segments. The key idea of the second approach is to use a set of sliding windows over the sequence. Each sliding window keeps a set of statistics of a sequence segment that mainly includes the number of occurrences of each item in that segment. Combining these statistics efficiently yields the complete set of regions of high occurrence of the items of the given alphabet. After identifying these regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a region). An efficient algorithm for mining frequent arrangements of temporal intervals on a single sequence is applied on the converted sequence to discover frequently occurring arrangements of these regions. The proposed algorithms are tested on various DNA sequences producing results with significant biological meaning.en_US
dc.language.isoen_US
dc.publisherBoston University Computer Science Departmenten_US
dc.relation.ispartofseriesBUCS Technical Reports;BUCS-TR-2006-027
dc.titleDiscovering Frequent Poly-Regions in DNA Sequencesen_US
dc.typeTechnical Reporten_US


This item appears in the following Collection(s)

Show simple item record