Python: module Fasta

Fasta

Classes



fasta
sfasta

class fasta

    a class to multi fasta data each sequence is a instance of the sfasta class Various operators are defined to merge, combine fasta collections and subsets.

Methods defined here:

__add__(self, other)
fasta.__add__ (other) concatenate two multifasta objects in a new one

__and__(self, other)
intersection of two multifasta (based on Ids)

__contains__(self, other)
x in y # where the match is based on sequence Ids

__delitem__(self, key)
destruction of one sequence: del data["myId"]

__eq__(self, other)
x == y # True is x and y correspond to the same collections of Ids e.g.: c = a + b d = c - a d == b # True

__getitem__(self, key)
accessor to one sequence: data["myId"]

__init__(self, fname='', setName=None, verbose=0)
fname: if fname is a string, then it is assimilated to a filename if it is a list, then it is assimilated to a list of lines to parse. setName: a name for the set. data = fasta("toto.fst")

__len__(self)
The number of sequences if the collection

__or__(self, other)
merge two multi fasta objects, based on sequence Ids

__repr__(self)

__setitem__(self, id, seq)
insertion of one sequence: data["myId"] = seq # where seq is a sfasta instance

__sub__(self, seq)
c = x - y # remove sequences of y present in x, return in c

ids(self)
fasta.ids: return a list of Ids of the sequences

load(self, fname, verbose=0)
To load a generic fasta file from disk.

out(self, f=<open file '<stdout>', mode 'w' at 0x7fe30c311198>, Ids=None, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
A sequence formatter. f: the file descriptor to write the data content Ids: a selection of Ids to write. If None: everything is output. oneLine: all sequence on one line, else break at step step: if oneLine is False, line are truncated each step. upper: force uppercase lower: force lowercase star: add star at end of sequence pretty: split each line as series of 10 letters separated by blank

parse(self, lines, verbose)
perform the effective parsing of lines (i.e. a series of lines as: > Id comment OR >Id comment dataline dataline Each sequence is a dictionnary of id comment sequence

splitwrite(self, fileExt='.fst', path='./', Ids=None, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
fasta.splitwrite(): This will split output on the form one file per sequence. path: the directory to write in (./) fileExt: file extension to use (.fst) Ids: if not None, only these sequences will be output.

subSet(self, theList, verbose=0)
return a new instance corresponding to the subset of Ids in theList.

write(self, fname, fmode='w', Ids=None, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
fasta.write: write a collection (subset or all) of sequences to file fname: filename fmode: one of "w", "a", etc Ids: if None all sequences are output. Else, only the Ids in the list are output will propagate attribute parameters to fasta.out()

class sfasta

    sfasta: a class to manage a single fasta sequence data is organized as a dictionnary of: id: the sequence identifier cmt: (comment after id) s: the sequence

Methods defined here:

__add__(self, other)
sfasta.__add__ (other) concatenate two sequences in a new one. c = a + b # c is sequence of a then sequence of b merged into one

__contains__(self, other)
sfasta.__contains__(other) : does sequence contain some subsequence ? if x in y: # is a a subsequence of y ?

__eq__(self, other)
sfasta.__eq__() : are the sequences identical ? a == b # are sequences strictly identical ?

__init__(self, id=None, seq=None, cmt=None, verbose=0)
id: sequence id cmt: comment on the "> id" line, after the id seq: the sequence itself

__len__(self)
return sequence length

__repr__(self)
Flat representation of sequence

__sub__(self, other)
sfasta.__sub__(other) : remove exact occurrence of other in sequence ? c = a - b # c is sequence of a from which b has been removed

cmt(self, cmt=None)
return comment, or assign it

id(self, id=None)
return id, or assign it

out(self, f=<open file '<stdout>', mode 'w' at 0x7fe30c311198>, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
A sequence formatter. out: will output formatted content of sequence oneLine: all sequence on one line, else break at step step: if oneLine is False, line are truncated each step. upper: force uppercase lower: force lowercase star: add star at end of sequence pretty: split each line as series of 10 letters separated by blank

s(self, seq=None)
return the sequence string (if seq is None), or assign it (if seq is specified)

write(self, fname, fmode='w', oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
write: this will perform sequence output in fname, using fmode (one of classical "w", "a", etc)