Software:
The software is a multistep approach to classify and cluster Biological Sequences and Structures, via Compression.
In the links below, we provide the source
code, the executables and the datasets used for the experiments described in the manuscript.
The software is released under the
GNU
General Public License by the
Free
Software Foundation and is free software.
|
Downloads of the software:
-
Entire package for different platforms
(zip format
or tar.gz format).
-
Source Code for gcc compiler (Linux/Unix), including Compression Boosting Library,
(P. Ferragina, R. Giancarlo, G. Manzini, 2005) (zip
format or tar.gz format) (compilation
tested on : FreeBSD 6.1-RELEASE i386, Linux Ubuntu 5.10 Kernel 2.6.15.4 i686,
Linux Slackware 10.2 kernel 2.6.15.4,
Mac Os X 10.4.8 Kernel Darwin 8.8 X86).
-
Binary executable files for
Cygwin/Windows (zip format or
tar.gz format) (compiled and tested under Cygwin, version 2.05).
-
Binary executable files for Linux/Unix
I386 Architecutre(zip
format or
tar.gz format) (tested on : FreeBSD 6.1-RELEASE i386, Linux Ubuntu 5.10 Kernel 2.6.15.4 i686,
Linux Slackware 10.2 kernel 2.6.15.4, ).
-
Binary executable files for Mac
Os X (zip format or
tar.gz format) (tested on Mac Os X 10.4.8 Kernel Darwin 8.8 X86
).
-
Documentation file.
|
Links to datasets:
Links to available datasets used
for the experimentation are provided here. More details about the datasets is provided in the manuscript.
-
CK-36-PDB dataset consisting of 36 amino acid sequences (zip or tar.gz format)
-
CK-36-REL dataset consisting of 36 complete TOPS strings with contact map (zip or tar.gz format)
-
CK-36-SEQ dataset consisting of 36 TOPS strings of secondary structure elements (zip or tar.gz format)
|
|