Simplesam¶
Simple pure Python SAM parser and objects for working with SAM records
Classes to handle alignments in the SAM format.
Reader -> Sam -> Writer
-
class
simplesam.
Reader
(f, regions=False, kind=None, samtools_path='samtools')[source]¶ Read SAM/BAM format file as an iterable.
-
__init__
(f, regions=False, kind=None, samtools_path='samtools')[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()[source]¶ Returns the number of reads in an indexed BAM file. Not implemented for SAM files.
-
__weakref__
¶ list of weak references to the object (if defined)
-
seqs
¶ Return just the sequence names from the @SQ library as a generator.
-
-
class
simplesam.
Sam
(qname='', flag=4, rname='*', pos=0, mapq=255, cigar='*', rnext='*', pnext=0, tlen=0, seq='*', qual='*', tags=[])[source]¶ Object representation of a SAM entry.
-
__getitem__
(tag)[source]¶ Retreives the SAM tag named “tag” as a tuple: (tag_name, data). The data type of the tag is interpreted as the proper Python object type.
>>> x = Sam(tags=['NM:i:0', 'ZZ:Z:xyz']) >>> x['NM'] 0 >>> x['ZZ'] 'xyz'
-
__init__
(qname='', flag=4, rname='*', pos=0, mapq=255, cigar='*', rnext='*', pnext=0, tlen=0, seq='*', qual='*', tags=[])[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
()[source]¶ Returns the length of the portion of
self.seq
aligned to the reference. Unaligned reads will have len() == 0. Insertions (I) and soft-clipped portions (S) will not contribute to the aligned length.>>> x = Sam(cigar='8M2I4M1D3M4S') >>> len(x) 16
-
__setitem__
(tag, data)[source]¶ Stores the SAM tag named “tag” with the value “data”. The data type of the tag is interpreted from the Python object type.
>>> x = Sam(tags=[]) >>> x['NM'] = 0 >>> x['NM'] 0
-
__str__
()[source]¶ Returns the string representation of a SAM entry. Correspondes to one line in the on-disk format of a SAM file.
-
cigars
¶ Returns the CIGAR string as a tuple.
>>> x = Sam(cigar='8M2I4M1D3M') >>> x.cigars ((8, 'M'), (2, 'I'), (4, 'M'), (1, 'D'), (3, 'M'))
-
coords
¶ Returns a range of genomic coordinates for the query sequence positions in the gapped alignment.
-
duplicate
¶ Returns True if the read is a PCR or optical duplicate.
-
gapped
(attr, gap_char='-')[source]¶ Return a
Sam
sequence attribute or tag with all deletions in the reference sequence represented as ‘gap_char’ and all insertions in the reference sequence removed. A sequence could be :class:Sam.seq
,Sam.qual
, or anySam
tag that represents an aligned sequence, such as a methylation tag for bisulfite sequencing libraries.>>> x = Sam(*'r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *'.split()) >>> x.gapped('seq') 'TTAGATAAGATA-CTG' >>> x = Sam(*'r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *'.split(), tags=['ZM:Z:.........M....M.M']) >>> x.gapped('ZM') '............-M.M'
-
mapped
¶ Returns True of the read is mapped.
-
paired
¶ Returns True if the read is paired and each segment properly aligned according to the aligner.
-
passing
¶ Returns True if the read is passing filters, such as platform/vendor quality controls.
-
reverse
¶ Returns True if
Sam.seq
is being reverse complemented.
-
safename
¶ Return
Sam.qname
without paired-end identifier if it exists
-
secondary
¶ Returns True if the read alignment is secondary.
Parses the tags string to a dictionary if necessary.
>>> x = Sam(tags=['XU:Z:cgttttaa', 'XB:Z:cttacgttaagagttaac', 'MD:Z:75', 'NM:i:0', 'NH:i:1', 'RG:Z:1']) >>> sorted(x.tags.items(), key=lambda x: x[0]) [('MD', '75'), ('NH', 1), ('NM', 0), ('RG', '1'), ('XB', 'cttacgttaagagttaac'), ('XU', 'cgttttaa')]
-
-
class
simplesam.
Writer
(f, header=None)[source]¶ Write SAM/BAM format file from
Sam
objects.-
__weakref__
¶ list of weak references to the object (if defined)
-
-
simplesam.
bam_read_count
(bamfile, samtools_path='samtools')[source]¶ Return a tuple of the number of mapped and unmapped reads in a BAM file
-
simplesam.
decode_tag
(tag_string)[source]¶ Parse a SAM format tag to a (tag, type, data) tuple. Python object types for data are set using the type code. Supported type codes are: A, i, f, Z, H, B
>>> decode_tag('YM:Z:#""9O"1@!J') ('YM', 'Z', '#""9O"1@!J') >>> decode_tag('XS:i:5') ('XS', 'i', 5) >>> decode_tag('XF:f:100.5') ('XF', 'f', 100.5)
-
simplesam.
encode_tag
(tag, data)[source]¶ Write a SAM tag in the format
TAG:TYPE:data
. Infers the data type from the Python object type.>>> encode_tag('YM', '#""9O"1@!J') 'YM:Z:#""9O"1@!J'
Return a dictionary containing the tags
-
simplesam.
tile_region
(rname, start, end, step)[source]¶ Make non-overlapping tiled windows from the specified region in the UCSC-style string format.
>>> list(tile_region('chr1', 1, 250, 100)) ['chr1:1-100', 'chr1:101-200', 'chr1:201-250'] >>> list(tile_region('chr1', 1, 200, 100)) ['chr1:1-100', 'chr1:101-200']