The simulation data of two (imperfect) palindromic motifs
There are two subdirectories, each containing simulated data
for one palindromic motif. The dataset of each motif consists
of the following two files:
1) simu.mf -- training true sites
2) simu.Test.fa -- testing sites
Simulation Parameters
1) Base Frequencies
We use a uniform distribution in this simulation,
That is, the frequency of each base at each position is 0.25.
False and true sites are simulated under the same uniform
distribution, but true sites are palindromic
sites generated with the following probabilites:
2) The probability of forming a complementary base pair (A-T or G-C)
between two corresponding positions of two palindromic motifs
are given in the following table:
Position | pairing probability
Pos-1 | Pos-2 | motif_A | motif_B
----------------------------------------------------
0 | 11 | 0.99 | 0.90
1 | 10 | 0.95 | 0.85
2 | 9 | 0.90 | 0.75
3 | 8 | 0.65 | 0.65
4 | 7 | 0.50 | 0.50
5 | 6 | 0.25 | 0.25
----------------------------------------------------
Prepared by weichun huang, weichun.huang@duke.edu