| |
- fasta
- sfasta
class fasta |
|
a class to multi fasta data
each sequence is a instance of the sfasta class
Various operators are defined to merge, combine fasta collections and subsets. |
|
Methods defined here:
- __add__(self, other)
- fasta.__add__ (other)
concatenate two multifasta objects in a new one
- __and__(self, other)
- intersection of two multifasta (based on Ids)
- __contains__(self, other)
- x in y # where the match is based on sequence Ids
- __delitem__(self, key)
- destruction of one sequence: del data["myId"]
- __eq__(self, other)
- x == y # True is x and y correspond to the same collections of Ids
e.g.:
c = a + b
d = c - a
d == b # True
- __getitem__(self, key)
- accessor to one sequence: data["myId"]
- __init__(self, fname='', setName=None, verbose=0)
- fname: if fname is a string, then it is assimilated to a filename
if it is a list, then it is assimilated to a list of lines to parse.
setName: a name for the set.
data = fasta("toto.fst")
- __len__(self)
- The number of sequences if the collection
- __or__(self, other)
- merge two multi fasta objects, based on sequence Ids
- __repr__(self)
- __setitem__(self, id, seq)
- insertion of one sequence: data["myId"] = seq # where seq is a sfasta instance
- __sub__(self, seq)
- c = x - y # remove sequences of y present in x, return in c
- ids(self)
- fasta.ids:
return a list of Ids of the sequences
- load(self, fname, verbose=0)
- To load a generic fasta file from disk.
- out(self, f=<open file '<stdout>', mode 'w' at 0x7fe30c311198>, Ids=None, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
- A sequence formatter.
f: the file descriptor to write the data content
Ids: a selection of Ids to write. If None: everything is output.
oneLine: all sequence on one line, else break at step
step: if oneLine is False, line are truncated each step.
upper: force uppercase
lower: force lowercase
star: add star at end of sequence
pretty: split each line as series of 10 letters separated by blank
- parse(self, lines, verbose)
- perform the effective parsing of lines
(i.e. a series of lines as:
> Id comment OR >Id comment
dataline
dataline
Each sequence is a dictionnary of
id
comment
sequence
- splitwrite(self, fileExt='.fst', path='./', Ids=None, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
- fasta.splitwrite():
This will split output on the form one file per sequence.
path: the directory to write in (./)
fileExt: file extension to use (.fst)
Ids: if not None, only these sequences will be output.
- subSet(self, theList, verbose=0)
- return a new instance corresponding to the subset of Ids in theList.
- write(self, fname, fmode='w', Ids=None, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
- fasta.write:
write a collection (subset or all) of sequences to file
fname: filename
fmode: one of "w", "a", etc
Ids: if None all sequences are output. Else, only the Ids in the list are output
will propagate attribute parameters to fasta.out()
|
class sfasta |
|
sfasta: a class to manage a single fasta sequence
data is organized as a dictionnary of:
id: the sequence identifier
cmt: (comment after id)
s: the sequence |
|
Methods defined here:
- __add__(self, other)
- sfasta.__add__ (other)
concatenate two sequences in a new one.
c = a + b # c is sequence of a then sequence of b merged into one
- __contains__(self, other)
- sfasta.__contains__(other) :
does sequence contain some subsequence ?
if x in y: # is a a subsequence of y ?
- __eq__(self, other)
- sfasta.__eq__() :
are the sequences identical ?
a == b # are sequences strictly identical ?
- __init__(self, id=None, seq=None, cmt=None, verbose=0)
- id: sequence id
cmt: comment on the "> id" line, after the id
seq: the sequence itself
- __len__(self)
- return sequence length
- __repr__(self)
- Flat representation of sequence
- __sub__(self, other)
- sfasta.__sub__(other) :
remove exact occurrence of other in sequence ?
c = a - b # c is sequence of a from which b has been removed
- cmt(self, cmt=None)
- return comment, or assign it
- id(self, id=None)
- return id, or assign it
- out(self, f=<open file '<stdout>', mode 'w' at 0x7fe30c311198>, oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
- A sequence formatter.
out: will output formatted content of sequence
oneLine: all sequence on one line, else break at step
step: if oneLine is False, line are truncated each step.
upper: force uppercase
lower: force lowercase
star: add star at end of sequence
pretty: split each line as series of 10 letters separated by blank
- s(self, seq=None)
- return the sequence string (if seq is None), or assign it (if seq is specified)
- write(self, fname, fmode='w', oneLine=True, step=80, upper=False, lower=False, star=False, pretty=False)
- write: this will perform sequence output in fname, using fmode (one of classical "w", "a", etc)
| |