Sequtils submodule¶
- pyaln.sequtils.sequence_identity(a, b, gaps='y')¶
Compute the sequence identity between two sequences.
The definition of sequence_identity is ambyguous as it depends on how gaps are treated, here defined by the gaps argument. For details and examples, see this page
- Parameters
a (str) – first sequence, with gaps encoded as “-“
b (str) – second sequence, with gaps encoded as “-“
gaps (str) – defines how to take into account gaps when comparing sequences pairwise. Possible values: - ‘y’ : gaps are considered and considered mismatches. Positions that are gaps in both sequences are ignored. - ‘n’ : gaps are not considered. Positions that are gaps in either sequences compared are ignored. - ‘t’ : terminal gaps are trimmed. Terminal gap positions in either sequences are ignored, others are considered as in ‘y’. - ‘a’ : gaps are considered as any other character; even gap-to-gap matches are scored as identities.
- Returns
sequence identity between the two sequences
- Return type
float
Examples
>>> sequence_identity('ATGCA', ... 'ATGCC') 0.8
>>> sequence_identity('--ATC-GGG-', 'AAATCGGGGC', gaps='y') 0.6
Note
To compute sequence identity efficiently among many sequences, use
score_similarity()
instead.
- pyaln.sequtils.weighted_sequence_identity(a, b, weights, gaps='y')¶
Compute the sequence identity between two sequences, different positions differently
The definition of sequence_identity is ambyguous as it depends on how gaps are treated, here defined by the gaps argument. For details and examples, see this page
- Parameters
a (str) – first sequence, with gaps encoded as “-“
b (str) – second sequence, with gaps encoded as “-“
weights (list of float) – list of weights. Any iterable with the same length as the two input sequences (including gaps) is accepted. The final score is divided by their sum (except for positions not considered, as defined by the gaps argument).
gaps (str) – defines how to take into account gaps when comparing sequences pairwise. Possible values: - ‘y’ : gaps are considered and considered mismatches. Positions that are gaps in both sequences are ignored. - ‘n’ : gaps are not considered. Positions that are gaps in either sequences compared are ignored. - ‘t’ : terminal gaps are trimmed. Terminal gap positions in either sequences are ignored, others are considered as in ‘y’. - ‘a’ : gaps are considered as any other character; even gap-to-gap matches are scored as identities.
- Returns
sequence identity between the two sequences
- Return type
float
Examples
>>> weighted_sequence_identity('ATGCA', ... 'ATGCC', weights=[1, 1, 1, 1, 6]) 0.4
>>> weighted_sequence_identity('ATGCA', ... 'ATGCC', weights=[1, 1, 1, 1, 1]) 0.8
Note
To compute sequence identity efficiently among many sequences, use
score_similarity()
instead.