Exon CGH array design

The dataset come from $ 16$ arrays hybridized with DNA from subjects with epilepsy, autism, heart defects and mental disorders. Each experiment was performed on the 180 K exon targeted oligonucleotide array.

Figure 2: 180 K exon array - example of probe coverage of short and long exons.
Image v8_exon_example

The design of the chip involved two stages. First, the prototype covering only exonic and microRNA regions was constructed. The main aim at this stage was to develop the array that allows detecting DNA copy number changes of the single exon. Therefore, it was postulated to cover each exon by the same number of oligos. For a given set of $ 1714$ selected genes (including those related to epilepsy, autism, heart defects, mental disorders and other known pathologies) it was decided that each exon would be covered by approximately $ 6$ probes. All oligos were taken from $ 24$ M Agilent database of preselected probes. The distribution of probe coverage inside long exons, for which the number of available probes was greater than $ 6$, was fixed as uniform. For short exons, with less than $ 6$ probes available, missing oligos was selected from surrounding region keeping the balance between proximity and symmetricity. Note, that oligo may overlap with each other at most in $ 20\%$ of its length. Figure 2 presents the example of probe coverage for short and long exons. Figure 3 shows the distribution of probes in exons and in exon neighborhood ($ +/- 500$bp).

Figure 3: 180 K exon array - the number of exons(y-axis), that consists of 0,1,2 ...probes (x-axis)
Image v8_exon_distr

Moreover, every second probe was taken from opposite strand to obtain better performance. The exemplary output of aCGH experiment for these two designs is presented in Figure 4.

Figure 4: 180 K exon array, exemplary aCGH experiment for chromosome 1 in both designs (one and two DNA strands).
Image v8
The prototype coverage was two times denser than the desired one in final version. A set of hybridizations was performed with the prototype version. Performance score of each probe was computed as following: segmentation was performed on data from these experiments. Empirical cumulative distribution function for $ \cal F$ distribution of logratios deviations from their segments means was estimated from all logratios, all experiments. Given a probe and its deviations from segment which it belongs to, we assign as a score a p-value from the Kolmogorov-Smirnov two sided test for deviations to be sampled from $ \cal F$ distribution.

Next step involved combining the prototype design with backbone, i.e. probes putted uniformly across the genome. Densely covered regions, exonic double covered regions were thinned with heuristic approach which considered previously assigned scores and uniformity of nascent coverage (sizes of introduced gaps).