Sequtils submodule¶

pyaln.sequtils.sequence_identity(a, b, gaps='y')¶

Compute the sequence identity between two sequences.

The definition of sequence_identity is ambyguous as it depends on how gaps are treated, here defined by the gaps argument. For details and examples, see this page

Parameters

a (str) – first sequence, with gaps encoded as “-“
b (str) – second sequence, with gaps encoded as “-“
gaps (str) – defines how to take into account gaps when comparing sequences pairwise. Possible values: - ‘y’ : gaps are considered and considered mismatches. Positions that are gaps in both sequences are ignored. - ‘n’ : gaps are not considered. Positions that are gaps in either sequences compared are ignored. - ‘t’ : terminal gaps are trimmed. Terminal gap positions in either sequences are ignored, others are considered as in ‘y’. - ‘a’ : gaps are considered as any other character; even gap-to-gap matches are scored as identities.

Returns

sequence identity between the two sequences

Return type

float

Examples

>>> sequence_identity('ATGCA',
...                   'ATGCC')
0.8

>>> sequence_identity('--ATC-GGG-',
                      'AAATCGGGGC',
                      gaps='y')
0.6

Note

To compute sequence identity efficiently among many sequences, use score_similarity() instead.

pyaln.sequtils.weighted_sequence_identity(a, b, weights, gaps='y')¶

Compute the sequence identity between two sequences, different positions differently