
Simple pure Python SAM parser and objects for working with SAM records

Classes to handle alignments in the SAM format.

Reader -> Sam -> Writer

class simplesam.DefaultOrderedDict(default, items=[])[source]
__init__(default, items=[])[source]

Initialize self. See help(type(self)) for accurate signature.

class simplesam.Reader(f, regions=False, kind=None, samtools_path='samtools')[source]

Read SAM/BAM format file as an iterable.

__init__(f, regions=False, kind=None, samtools_path='samtools')[source]

Initialize self. See help(type(self)) for accurate signature.


Returns the number of reads in an indexed BAM file. Not implemented for SAM files.


list of weak references to the object (if defined)


Parse the header list and return a nested dictionary.


Returns the next Sam object


Return just the sequence names from the @SQ library as a generator.


Returns an interator that draws every nth read from the input file. Returns Sam.


Return a generator of UCSC-style regions tiling width.

class simplesam.Sam(qname='', flag=4, rname='*', pos=0, mapq=255, cigar='*', rnext='*', pnext=0, tlen=0, seq='*', qual='*', tags=[])[source]

Object representation of a SAM entry.


Retreives the SAM tag named “tag” as a tuple: (tag_name, data). The data type of the tag is interpreted as the proper Python object type.

>>> x = Sam(tags=['NM:i:0', 'ZZ:Z:xyz'])
>>> x['NM']
>>> x['ZZ']
__init__(qname='', flag=4, rname='*', pos=0, mapq=255, cigar='*', rnext='*', pnext=0, tlen=0, seq='*', qual='*', tags=[])[source]

Initialize self. See help(type(self)) for accurate signature.


Returns the length of the portion of self.seq aligned to the reference. Unaligned reads will have len() == 0. Insertions (I) and soft-clipped portions (S) will not contribute to the aligned length.

>>> x = Sam(cigar='8M2I4M1D3M4S')
>>> len(x)

Return repr(self).

__setitem__(tag, data)[source]

Stores the SAM tag named “tag” with the value “data”. The data type of the tag is interpreted from the Python object type.

>>> x = Sam(tags=[])
>>> x['NM'] = 0
>>> x['NM']

Returns the string representation of a SAM entry. Correspondes to one line in the on-disk format of a SAM file.


Returns the CIGAR string as a tuple.

>>> x = Sam(cigar='8M2I4M1D3M')
>>> x.cigars
((8, 'M'), (2, 'I'), (4, 'M'), (1, 'D'), (3, 'M'))

Returns a range of genomic coordinates for the query sequence positions in the gapped alignment.


Returns True if the read is a PCR or optical duplicate.

gapped(attr, gap_char='-')[source]

Return a Sam sequence attribute or tag with all deletions in the reference sequence represented as ‘gap_char’ and all insertions in the reference sequence removed. A sequence could be :class:Sam.seq, Sam.qual, or any Sam tag that represents an aligned sequence, such as a methylation tag for bisulfite sequencing libraries.

>>> x = Sam(*'r001      99      ref     7       30      8M2I4M1D3M      =       37      39      TTAGATAAAGGATACTG       *'.split())
>>> x.gapped('seq')
>>> x = Sam(*'r001      99      ref     7       30      8M2I4M1D3M      =       37      39      TTAGATAAAGGATACTG       *'.split(), tags=['ZM:Z:.........M....M.M'])
>>> x.gapped('ZM')

Return the relative index within the alignment from a genomic position ‘pos’


Returns True of the read is mapped.


Returns True if the read is paired and each segment properly aligned according to the aligner.


Return the ungapped reference sequence from the MD tag, if present.


Returns True if the read is passing filters, such as platform/vendor quality controls.


Returns True if Sam.seq is being reverse complemented.


Return Sam.qname without paired-end identifier if it exists


Returns True if the read alignment is secondary.


Parses the tags string to a dictionary if necessary.

>>> x = Sam(tags=['XU:Z:cgttttaa', 'XB:Z:cttacgttaagagttaac', 'MD:Z:75', 'NM:i:0', 'NH:i:1', 'RG:Z:1'])
>>> sorted(x.tags.items(), key=lambda x: x[0])
[('MD', '75'), ('NH', 1), ('NM', 0), ('RG', '1'), ('XB', 'cttacgttaagagttaac'), ('XU', 'cgttttaa')]
class simplesam.Writer(f, header=None)[source]

Write SAM/BAM format file from Sam objects.

__init__(f, header=None)[source]

Initialize self. See help(type(self)) for accurate signature.


list of weak references to the object (if defined)


Write the string representation of the sam Sam object.

simplesam.bam_read_count(bamfile, samtools_path='samtools')[source]

Return a tuple of the number of mapped and unmapped reads in a BAM file


Parse a SAM format tag to a (tag, type, data) tuple. Python object types for data are set using the type code. Supported type codes are: A, i, f, Z, H, B

>>> decode_tag('YM:Z:#""9O"1@!J')
('YM', 'Z', '#""9O"1@!J')
>>> decode_tag('XS:i:5')
('XS', 'i', 5)
>>> decode_tag('XF:f:100.5')
('XF', 'f', 100.5)
simplesam.encode_tag(tag, data)[source]

Write a SAM tag in the format TAG:TYPE:data. Infers the data type from the Python object type.

>>> encode_tag('YM', '#""9O"1@!J')

Return a dictionary containing the tags

simplesam.tile_region(rname, start, end, step)[source]

Make non-overlapping tiled windows from the specified region in the UCSC-style string format.

>>> list(tile_region('chr1', 1, 250, 100))
['chr1:1-100', 'chr1:101-200', 'chr1:201-250']
>>> list(tile_region('chr1', 1, 200, 100))
['chr1:1-100', 'chr1:101-200']

Indices and tables