Order validating methods

We can try to test how good this map is using a ``verification algorithm''. The usual method of flipping markers inside a window (ripple command in mapmaker) is called flips in CarthaGene. It takes 3 parameters: the size of the flipping window, a printing threshold on the difference of loglikelihood with the best map and a last flag that says if the flips command must be reiterated if a better map is found. Here we will use a window of 4 markers, all maps whose loglikelihood is better or within 1.0 LOD unit of the loglikelihood of the best map will be printed and the flips command will be reiterated on the new improved map if such a map is found.

CG> flips 4 1 1

Repeated Flip(window size : 4, threshold : 1.00).


Map -1 : log10-likelihood =   -70.86
-------:
 Set : Marker List ...
   1 : L029 L010 L078 T035 D022 L001 A059 A079 M030 M232 T018 M076 M237 A03...

         2   2 2 1 2     1 2
 2 3 6 8 3 4 7 8 3 2 7 9 8 5 9  log10
 0 8 2 5 9 2 7 4 2 0 5 9 6 5 4    -70.86
[1 0 3 2]- - - - - - - - - - -     -0.00
[- - 3 2]- - - - - - - - - - -     -0.00
[1 0 - -]- - - - - - - - - - -     -0.00
 - - - - -[- - 3 2]- - - - - -     -0.00
 - - - - - - - - -[- - 3 2]- -      0.00
 - - - - - - - - - - -[1 0 3 2]      0.00
 - - - - - - - - - - -[- - 3 2]      0.00

Here we see that by flipping markers in the original best map, we have not improved the loglikelihood. We also see that several alternative orders exist. As said before, this is due to the existence of double markers. For example, a line such as [1 0 - -]- - - - - - - - -- - -0.00 says that by swapping the first and second markers of the map we don't change the likelihood. These are markers 20 and 38 (top rows). We can ask for the name of these 2 markers:

CG> mrkname 20
L029
CG> mrkname 38
L010
These two markers were effectively detected as double by the mrkdouble command. Ideally, all the previous searches should be repeated after merging the double markers which are strongly linked.

Another validation procedure is the polish command. It takes each marker successively and tries to insert it in all possible intervals. The variation in loglikelihood is reported for each marker and each destination interval.

CG> polish

Local map analysis:

        L029  L010  L078  T035  D022  L001  A059  A079  M030  M232  T018  M...
      ---------------------------------------------------------------------...
 L029 |-----   0.0  12.6   9.5  10.0  27.3  27.3  32.3  26.6  29.5  26.9  3...
 L010 |  0.0 -----  10.5   7.8   9.0  25.2  25.5  30.8  25.0  27.9  25.6  3...
 L078 |  7.6  10.5 -----   0.0   3.3  24.0  24.5  29.9  24.0  26.9  25.2  3...
 T035 |  7.8  10.5   0.0 -----   3.3  22.7  23.4  28.9  23.0  25.9  24.3  3...
 D022 |  4.4   6.8   1.9   3.3 -----  10.4   9.9  14.1  11.1  14.5  12.4  1...
 L001 | 14.6  26.4  19.5  22.7  10.4 -----   2.1   9.0   5.0  10.4  10.9  2...
 A059 | 15.9  28.2  21.3  24.8  12.5   2.1 -----   6.9   5.4  13.3  13.4  2...
 A079 | 17.9  31.5  24.8  28.7  16.7   7.5   6.9 -----   0.0   9.6  10.5  2...
 M030 | 17.7  31.4  24.8  28.7  16.7   7.5   6.9   0.0 -----   9.6  10.4  1...
 M232 | 16.0  29.5  22.8  26.5  16.0   7.2  10.9   8.1   9.6 -----   2.4  1...
 T018 | 15.1  29.2  22.5  26.3  16.8   8.7  13.0  10.2  11.9   2.4 -----  1...
 M076 | 17.8  34.0  27.9  32.7  22.7  15.7  21.2  18.8  20.9  12.9  11.9 --...
 M237 | 17.8  34.0  27.9  32.7  22.7  15.7  21.2  18.8  20.9  12.9  11.9   ...
 A036 | 20.3  37.5  31.7  37.0  27.4  21.6  28.4  26.1  28.8  21.6  21.8  1...
 M034 | 20.3  37.5  31.7  37.0  27.4  21.6  28.4  26.1  28.8  21.6  21.8  1...
      ---------------------------------------------------------------------...
We can see again the `double'' markers effect (with $0.0$ LOD differences). Otherwise, all markers seems to be relatively firmly placed with high LODs, all above $2.1$.

Let us have a look in detail to the best map, the map 12.

CG> maprintd 12

Map 12 : log10-likelihood =   -70.86, log-e-likelihood =  -163.17
-------:

Data Set Number  1 :

      Markers        Distance    Cumulative  Distance   Theta       2pt
Pos  Id name         Haldane     Haldane     Kosambi    (%%age)      LOD

  1  20 L029           0.0 cM      0.0 cM      0.0 cM     0.0 %%    18.1
  2  38 L010           5.9 cM      5.9 cM      5.6 cM     5.6 %%    13.1
  3  62 L078           0.0 cM      5.9 cM      0.0 cM     0.0 %%    21.4
  4  85 T035           2.5 cM      8.5 cM      2.5 cM     2.5 %%     9.6
  5 239 D022          11.5 cM     19.9 cM     10.4 cM    10.2 %%     6.4
  6  42 L001           1.1 cM     21.0 cM      1.1 cM     1.1 %%    19.9
  7 277 A059           2.2 cM     23.3 cM      2.2 cM     2.2 %%    18.4
  8 284 A079           0.0 cM     23.3 cM      0.0 cM     0.0 %%    21.7
  9 132 M030           3.4 cM     26.7 cM      3.3 cM     3.3 %%    16.0
 10 220 M232           1.1 cM     27.8 cM      1.1 cM     1.1 %%    17.8
 11  75 T018           4.7 cM     32.5 cM      4.5 cM     4.5 %%    12.8
 12 186 M237           0.0 cM     32.5 cM      0.0 cM     0.0 %%    19.9
 13  99 M076           5.9 cM     38.4 cM      5.6 cM     5.6 %%    13.0
 14 255 A036           0.0 cM     38.4 cM      0.0 cM     0.0 %%    21.4
 15  94 M034        ----------              ----------
                      38.4 cM                 36.2 cM


       15 markers, log10-likelihood =   -70.86
                   log-e-likelihood =  -163.17
This map will be kept as the best comprehensive map of the group in this tutorial. In practice, longer local search algorithm must be performed and other verification procedures should be used.

Thomas Schiex 2009-10-27