Download e-book Microarrays for an Integrative Genomics

Free download. Book file PDF easily for everyone and every device. You can download and read online Microarrays for an Integrative Genomics file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Microarrays for an Integrative Genomics book. Happy reading Microarrays for an Integrative Genomics Bookeveryone. Download file Free Book PDF Microarrays for an Integrative Genomics at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Microarrays for an Integrative Genomics Pocket Guide.
Functional genomics—the deconstruction of the genome to determine the biological function of genes and gene interactions—is one of the most fruitful new .
Table of contents

Instead of the distance between two features, the percentage of overlap between them may be employed to match the features from two platforms. A gene is matched to that DNA copy number feature with which it has the highest percentage of overlap. Among others De Menezes et al. Table 4 describes the steps of the approach, while Figure 4 visualizes the key problem. Overlap matching: the DNA copy number feature with the maximum percentage of overlap in sequence with the gene is sought.

The overlap procedure may be considered rather conservative matching too few probes. This could be due to the fact that the features of both platforms may have a rather disjunct coverage of the genome. There may be valid biological grounds for this. But this disjunct coverage may also cause relatively few genes to be assigned a DNA copy number feature. The OverlapPlus approach aims to tackle this. A gene may span a genomic region that is interrogated by multiple DNA copy number features. The overlap matching procedure then chooses an arbitrary feature that has its DNA copy number data assigned to the gene.

Potentially relevant information on the DNA copy number of the gene is then ignored. Table 5 describes the steps of the overlapAny algorithm. The disadvantages of the overlap matching procedure translate directly to the overlapAny approach. As the name suggests the overlapPlus matching procedure extends the overlap approach. Hereto overlapPlus alters the objective of feature matching. No longer are features of both platforms to be matched. Instead the new aim is to assign to each gene on the expression array the correct corresponding DNA copy number. This is achieved by first applying the overlap matching procedure.

The interpolation is warranted by the discrete nature of the underlying biological phenomenon. This interpolation principle has among others been proposed by Autio et al. Table 6 details the steps of the overlapPlus algorithm. In both panels no feature overlaps with the gene. In the top panel the overlapPlus approach would interpolate the DNA copy number data between feature j-1 and j , as there is no breakpoint between them. In the bottom panel, however, the features j-1 and j are separated by a breakpoint, and the overlapPlus procedure will not interpolate.

A drawback of the overlapPlus approach is the fact that it uses, next to the feature annotation information, the experimental data to assess the presence of a breakpoint. This makes the resulting matching dataset dependent: the matching may be different for subsets of the dataset. The sigaR-package offers three extensions to the distanceAny and overlapAny procedures. These extensions concern the case where multiple DNA copy number features match to a gene, and distanceAny and overlapAny then take a weighted average.

Instead of averaging, the first extension selects in line with the ACEit-package [ 11 ] the most extreme operationalized as the largest absolute deviation from zero segmented DNA copy number signal. This is done per sample individually. Consequently, the resulting DNA copy number signature may comprise data from all matched features.

However, this approach may also increase the chance of a false discovery. The second extension encompasses the introduction of an additional step prior to the weighted average. It may happen that one or more of the samples exhibit s a breakpoint a change in the segmented values within the set of matched DNA copy number features. This extension splits the set of matched features at all breakpoints occurring within it in any of the samples, before proceeding to the weighted average. As a result, within the matched DNA copy number - gene expression data set, a gene may appear multiple times: each time with the same expression signature the vector of expression values of that gene over the samples , but with a different DNA copy number signature the vector of DNA copy numbers over the samples.

When interested in cis -effect detection via univariate gene-wise analysis, this may increase the multiple testing correction. The third and last modification extends the previous one. Now the set of matched DNA copy number features is split into a collection of sets, each containing only a single DNA copy number feature. Again, in the final matched data set the same gene maybe present multiple times. The multiplicity correction may further increase compared to the previous modifications.

These extensions do not affect the actual assignment of DNA copy number features to genes, but produce only minor changes to the DNA copy number data summary, and are thus not taken along in the remainder. Five data sets have been downloaded to compare the matching procedures. Data set 1, referred to as the Chin data set [ 29 ], is a study involving breast cancer samples with genome and transcriptome profiled.

Details on the data sets are found in Table 7 e. Features of the platform pairs that produced the five data sets are matched by the following procedures: distance, distanceAny, overlap, overlapAny, and overlapPlus. Note the label-procedure is not taken along, for it is not applicable to the Chin and Taylor I data sets there will be no matching as labels of BACs, the DNA copy number probes of the Chin data set, and microRNA probes need not map to a gene label. The results of application of the matching procedures as implemented in the sigaR-package to the five data sets are presented in Table 7.

The distanceAny procedure resolves this drawback by limiting the search for a matching DNA copy number feature to a subdomain of the genome. However, the number of matched gene expression features falls dramatically to As an alternative to the distanceAny procedure, one may use the overlap or overlapAny procedure to circumvent its drawbacks.

Microarrays for an Integrative Genomics

But they perform poorly 1. This is due to the fact that there simply are no more overlapping features between the two platforms. A relaxation of the nonzero overlap between features from both platforms is offered by the overlapPlus procedure. This works out nicely for the Taylor I data set the percentage of matched gene expression features now at However, the overlapPlus procedure makes use of the experimental data breakpoints , which implies that the matching may be different between data sets generated on the same platform.

As seen from the above, the distanceAny and overlapAny procedures come with a tuning parameter the separation distance and the percentage of overlap, respectively , which affects the number of matched features. Not directly obvious, but no less important, the tuning parameter also determines the total number of DNA copy number features used in the construction of the matched gene dosage signature the vector of DNA copy number values of one genomic location over samples. Whereas distance, overlap and overlapPlus eventually select a single feature from the DNA copy number array, the distanceAny and overlapAny procedures potentially select more than one feature, and their data is aggregated into a matching signature.

Hence, the latter two procedures make more use of the experimental data. To contrast these high-level comparisons of the matching procedures, we show the consequences of employing a particular matching procedure at the level of an individual gene. It becomes quickly obvious that the matched DNA copy number features differ in number across matching procedures. More interesting is perhaps how the coverage of the gene varies between these sets of matched DNA copy number features. The distance matching procedure selects a single DNA copy number feature close to the middle of the gene.

Too large a coverage, e. Too small a coverage may assign it a rather noisy DNA copy number signature. The middle ground, with a reasonable coverage of the gene as shown by the overlapAny and distanceAny with a small window size procedures, seems an acceptable compromise. The consequences of choosing a matching procedure reveal themselves also in DNA copy number data, as matching procedures either select different features or utilize different ways of summarizing data from multiple features.

The vast majority of genes have DNA copy number signatures that vary little to nothing between the matching procedures Figure 7. Occasionally, however, there is a data point that is affected in a more serious manner by the choice of matching procedure. Figure 8 shows that the distanceAny method has one data point indicated by the orange circle that deviates from its counterpart in the other matched DNA copy number signatures. In this particular case, it is due to the large window size chosen, and the problem vanishes if the window size is decreased.

The red line is the best fitting piece-wise linear spline as obtained from the method described in [ 30 ]. The vertical dashed line separate the samples with a loss from those with a normals. The vertical dashed lines separate the samples with a normal from those with a gain, and those with a gain from those with an amplification. This suggests that the best matching procedure yields the highest correlations between the two molecular levels.

For many genes, the matching procedures yield identical correlations. Even when focussing on those genes with correlations varying over the matching procedures, the differences are often small. To provide some insight in which procedure yields the highest correlations, we compare the correlations of the matching procedures in a pairwise fashion.

Hereto we simply count how many times matching procedure A yields a higher correlation than matching procedure B, and vice versa. For the Chin data set, the distance and distanceAny methods give the best results more correlations that exceed that of other procedures than vice versa. The distance and distanceAny methods are followed by the overlapAny procedure, and finally, but not too far behind, the overlap and overlapPlus procedures.

A similar picture emerges from the Taylor II data set results not shown. No clear winner emerges from this comparison, but it points to either the distanceAny with a small window size or overlapAny procedure. Or, put differently, in the light of the results presented in Table 8 , this points to procedures capable of matching multiple DNA copy number features to the same gene, and that together have a reasonable coverage of that gene.

Finally, we illustrate the effect of matching on downstream analysis.

Biopolymers and Cell

We assess the cis -effect of a DNA copy number aberration on the expression levels of the genes mapping to it. The associated workflow is portrayed in Figure 9. Table 10 reports the number of significant genes for each data set - matching procedure combination the Taylor I and II data sets are excluded for being uninformative, neither provided anything significant. This number is reported on the whole set of genes matched by each procedure the size of this set can be found in Table 7 , but also on the restricted set containing only those genes that are matched by all procedures.

This order is concordant with the matching result: the more matched genes, the more discoveries.

This may obscure the comparison of the methods. Moreover, as pointed out before, the distance and distanceAny with a large window size procedures may match genes to DNA copy number features located elsewhere on the genome. This raises doubts over the interpretation of significant associations. In the restricted set of genes, the number of discoveries is constant over the methods, with the overlapAny procedure having one additional finding.

This could be interpreted as the additionally matched genes being assigned an unrelated DNA copy number signature. In summary, this comparison of downstream analyses suggests that at least in data sets generated on a high-resolution DNA copy number platform the overlapAny procedure may be preferred. Flowchart of cis -effect analysis. Matching of the features from different high-throughput platforms is a important preprocessing step for bioinformatic analyses of integrative genomics studies. We have described, reviewed and implemented sigaR-package the most widely used matching procedures found in literature.

Application of the matching procedures to five data sets generated on different platforms revealed that 1 the number of features matched varies considerably between the matching procedures, and 2 the choice of matching procedure may even affect although usually only to a minor degree the DNA copy number signature the vector of DNA copy number values over the samples assigned to a gene. These observations, which have their consequences on any downstream integrative analysis, facilitate an informed decision on the matching procedure of choice. The matching procedures have shown little difference in the number of features matched and have very little impact on downstream analysis results, in the several examples shown.

These results rely on correct pre-processing, of which copy number data segmentation is an important aspect. It should be kept in mind that, although overall results may be robust to matching procedure selection, this may not be true for all genes, as Figure 8 illustrates. We recommend to start the matching with the overlapAny procedure.

This may be conservative in some cases, but certainly has the clearest and most undisputed physical interpretation for matching. If this yields satisfactory results, the task is done. Else, remaining unmatched features may be handled either by the distanceAny with not too big a window around the gene or overlapPlus procedure.

Reference: Chin et al. Preprocessing: Pre-processing of both DNA copy number and gene expression data used here was as described in [ 32 ], with the additional steps of segmentation and calling via the R-package CGHcall [ 33 ], using default settings on the normalized data.

  • Complete Babylonian: A Teach Yourself Guide (Teach Yourself: Level 4).
  • Microarrays for an Integrative Genomics.
  • Breadcrumb!
  • OpticaOptical Properties And Spectroscopy Of Nanomaterials.
  • Article Tools.
  • VIGR Microarray | Larner College of Medicine | University of Vermont!

The annotation information of both datasets was updated as described below. As this is unlikely to be true and correct information is essential for matching to be performed adequately, annotation information for BAC clones from Ensembl was used to update the information. For BAC clones in the Chin data, we obtained updated start and end positions. The Chin gene expression array data contained probe sets. Using the Bioconductor package hguplus2. Some of those were allocated to more than one chromosome, in which case we took the first values for chromosome, start and end encountered in the data table.

Reference: Verhaak et al. Replicated samples were not taken into account. The Affymetrix gene expression array contains probe features, of which could be matched to a genomic location. After removal of probes mapping to the Y chromosome, probes for the gene expression data were left. The Agilent copy number platform consists of probe features of which all features could be matched to a genomic location. Segmentation and calling were performed using default settings except for undo. The Agilent gene expression array is a custom design on the K platform with 90K unique probes. Only of the unique probes could be matched to genomic locations using Ensembl.

Reference: Taylor et al. The files were imported into R and the data filtered for the aforementioned quality criteria and subsequently merged with the recent annotation data provided by Agilent eArray. The data was segmented and called using the CGHcall-package [ 33 ] using default settings. The resulting data consists of oligo probes autosomes only. The log 2 expression values were normalized using RMA [ 37 ] and the resulting data matrix filtered for QA criteria provided by Agilent that make sure that only meaningful expression values are kept in the dataset.

The final data object contained the expression values of miRNAs autosomes only from 49 samples. Preprocessing of the Affymetrix Human Exon 1. Briefly, Affymetrix Power Tools was used to read the raw data CEL files with hybridisation fluorescence intensities along with the latest version of annotation files and to normalise the gene-level data using the Robust Multichip Average RMA algorithm [ 37 ].

Microarrays for an Integrative Genomics (Kohane)

After quality filtering the result is stored in an ExpressionSet-object with summarized log 2 -ratios of genes autosomes only from 49 samples. For the R-code below it is assumed that the sigaR-package plus its dependencies have been activated, and that the cghCall and ExpressionSet objects called CNdata and GEdata, respectively have been loaded Tables 11 , 12 , 13 , 14 , 15 , 16 , 17 and WNvW conceived and carried out the project, and wrote the paper. All authors read and approved of the manuscript.

Nature , — Cancer Res , — PNAS , — PNAS , 36 — Gut , 79— Oncogene , — Cancer Res , 69 9 — PLoS One , 5 4 :e Edited by: Jonassen I, Kim J. Berlin: Springer; — Cancer Informatics , 1: 10— Bioinformatics , — Statistical Applications in Genetics and Molecular Biology Research Assistant. Experience with microarrays and Next Generation Sequencing is preferred.

Enjoy a challenging research and development environment working with multidisciplinary…. Cedars-Sinai reviews. Utilize a variety of basic analysis tools to determine things like genetic linkage, gene expression profiles, and molecular classifiers from data generated from…. Proficient in Sanger Sequencing , and basic knowledge of….

Staff Research Associate I. UCLA Health reviews. Sequencing library preparation and validation. Lab Technician. Complete Genomics 8 reviews. Molecular biology experience acquired from working with technologies such as PCR, sequencing , microarrays , and liquid handling automation is preferred.

1 Introduction

Genomics - Postdoctoral Researcher. Lawrence Livermore National Laboratory reviews. Postdoctoral Scholar Research Associate. USC reviews. Priority will be given to those who have prior work experience with microarray - or next generation sequencing-based platforms. Be the first to see new Sequencing Microarray Research jobs in California.

My email:. Indeed helps people get jobs: Over 10 million stories shared. For jobs in Finland, visit fi. Indeed Salary Estimate. Was our estimate helpful to you? Yes No. Job title, keywords, or company. City, state, or zip code. BS or MS degree in a life sciences field: save job - more Enjoy a challenging research and development environment… save job - more Enjoy a challenging research and development environment working with multidisciplinary… save job - more Proficient in Sanger Sequencing , and basic knowledge of… save job - more Be the first to see new Sequencing Microarray Research jobs in California My email: By creating a job alert or receiving recommended jobs, you agree to our Terms.

You can change your consent settings at any time by unsubscribing or as detailed in our terms.