Tuesday, August 28, 2007

Vinga Entropic Profiles

Instructions for using the software acconpanying BMC Bioinformatics manuscript:

The software to canculate Renyi entropic profiles is delivered as

1) Open source Matlab m-coded toolbox.

2) Stand alone executables for Windows, Mac and Linux. Each environmenty requires that a runtime server application (MCR) be downloaded and installed. For those who have Matlab installed, please note that most likely you will still need to install the appropriate MCR. The reason is that the stand alone executable files were produced for MCR 2006a so unless you have Matlab version 2006a you will need to also install the MCR listed below for your operating system. No Mathworks/Matlab licenses are needed.

2.1. WINDOWS: first install MCRinstaller 2006a for Windows and then unzip the files in renyi_bin_win.zip.

2.1. LINUX: first install MCRinstaller 2006a for Linux and then unzip the files in renyi_bin_lin.zip. (coming soon)

2.3 MACKINTOSH: first install MCRinstaller 2006a for Linux and then unzip the files in renyi_bin_mac.zip. (coming soon)

The execution of the executable code is similar to that of the m-code function vinga_entropic except that the input argumets are passed space delimited. For example, instead of using

vinga_entropic('m4.seq',6,12,253)

one would use, at the commant line,

vinga_entropic m4.seq 6 12 253

The detailed help information for this function is provided below. Please note that the executable binary file will produce several export files. These include producing the figures in pdf and png formats and also exporting the numerical results of the calculations as an XML.

help on vinga_entropic:

VINGA_ENTROPIC is the main function of the toolbox for processing sequences
Syntax: vinga_entropic(fastafile,N,phi,position)
Description: this is the main function that calls all others and is the
right function to compile for sequence processing. The numerical results,
figures and intermediate calculations will be stored with names built
from the fasta file name. For example, if the fastafile is m4.seq, all
graphs and figures will be saved as m4_*.pdf and m4_*.png.
****Input arguments:
fastafile - text file with sequence to be analyised (in FASTA format)
N - kernel resolution parameter
phi - kernel smoothing parameter
position - local study of particular symbol in the original sequence
****Output:
figure files with all the graphical results (see webpage)
XML page with numerical data

EXAMPLE: vinga_entropic('m4.seq',6,12,253)

NOTE: The compiled version takes char inputs, therefore the type conversion is
checked and corrected if needed
-----------------------------------------------------------------
Authors: Susana Vinga and Jonas S Almeida
Reference: "Local Rényi entropic profiles of DNA sequences"
BMC Bioinformatics (submitted)
Version: 2007.08.27
Webpage: http://algos.inesc-id.pt/~svinga/ep/
-----------------------------------------------------------------


Please write us at Susana Vinga <svinga@vinci.inesc-id.pt> if you need any assistance with teh installation or if you want to report any bugs.

Monday, January 22, 2007

Local Renyi

Additional material to submitted manuscript:
Local Rényi entropic profiles of DNA sequencesSusana Vingaa,b* and Jonas S. Almeidac,d
J. Theor. Biol. (submitted)
a Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R. Alves Redol 9, 1000-029 Lisboa, Portugalb Departamento de Bioestatística e Informática, Faculdade de Ciências Médicas – Universidade Nova de Lisboa (FCM/UNL), Campo dos Mártires da Pátria 130, 1169-056 Lisboa, Portugalc Dept Biostatistics and Applied Mathematics, Univ. Texas MDAnderson Cancer Center - unit 447, 1515 Holcombe Blvd, Houston TX 77030-4009, USAd Biomathematics Group, Instituto de Tecnologia Química e Biológica – Universidade Nova de Lisboa (ITQB/UNL), R. Qta. Grande 6, 2780-156 Oeiras, Portugal
E-mail addresses: svinga at algos inesc-id pt (SV), jalmeida at mdanderson org (JSA).
*Corresponding author

Click for...
DNA Datasets
Download text files (in FASTA format) with all the DNA sequences used in this study [seq.zip].

Sequence name
Brief description
m3
random with inserted motif L=3 'ATC'
m4
random with inserted motif L=4 'ATCG'
m5
random with inserted motif L=5 'ATCGA'
Es
experimental promoter regions of B.subtilis - see paper for full description

MATLAB source code
Current version 1 (Jan. 19, 2007). Next upgrades will be posted here.
See an application example and look at functions' help. (NOTE: since these files were automatically generated some graphs appear differently from those in the manuscript).
Click to download all m-code MATLAB functions entropicprofile.zip, which includes the following files:

File name
Brief description
readfasta.m
Reads sequences from FASTA format files to struct MATLAB variables
count_repeat.m
Counts L-tuple repetitions for each position in input DNA sequences
fill_kernel3D
Calculates probability density estimation matrix (KM) with fractal kernel. Calls kernel_analytical.m
kernel_analytical.m
Closed form for fractal kernel calculation
normKM.m
Normalizes KM estimations.
find_scale.m
Finds the scale where maxima and minima of KM occur
local_study.m
Analyses specific user defined position/symbol
Links
[Renyi continuous entropy][Alfréd Rényi's Biography][MATLAB site]
Suggestions &Comments: svinga at algos inesc-id ptCreated: 2007 Jan 19 -- Last update: 2007 Jan 19

Thursday, January 18, 2007

Fractal Density Kernel for Iterated Maps

A fractal density kernel for iterated maps of biological sequences was recently identified:

Almeida, J.S., S.Vinga (2006) Computing distribution of scale independent motifs in biological sequences. Algorithms for Molecular Biology. 1:18. [PMID:17049089].

The corresponding Matlab toolbox is available at http://genechaos.blogspot.com/2006/10/density-kernel-toolbox.html.

Wednesday, December 20, 2006

Membership

For consistency's sake let's agree that one of us (me, Jonas) is consulted when adding more members with Admin permissions. On the other hand all Admin members can add guest Members without consulting anyone. Of course there is nothing preventing any of the Admins from taking over should I be run over by a bus for example.

NNC (Nonparametric Nonlinear Correlation)

NNC (Nonparametric Nonlinear Correlation) is a method proposed in our manuscript under submission "A nonparametric approach to detect nonlinear correlation in gene expression", by Yian A. Chen, Jonas S. Almeida, Adam J. Richards, Peter Müller, Raymond, J. Carroll, and Baerbel Rohrer. The following supplementary information is available:
  1. Supplementary Results ([link to supplementary material in the Publisher's site]).
  2. Matlab Toolbox.
  3. Tutorial.
  4. BioinformaticStation stand alone module (coming soon).
  5. Python code for transcriptional regulatory network visualization.

Mission Statement

MB is a web-based collaborative resource for a distributed group of researchers working on theory development, algorithm identification and web-based deployment of applications in the field of Mathematical and Computational Biology and Bioinformatics. The primary utilization of this resource is that of hosting white papers and supplementary material to published papers, including original software libraries.