TCGA: Genome-Wide Analysis of Expression Quantitative Trait Loci Breast Cancer – Nicholas Knoblauch

TCGA: Genome-Wide Analysis of Expression Quantitative Trait Loci Breast Cancer – Nicholas Knoblauch


Nicholas Knoblaugh:
So the title of my talk is “Genome-Wide Analysis of eQTL in Breast Cancer.” But really what
I’m talking about today is the interaction between genotype and phenotype, more specifically
the interaction between germline genotype and breast cancer phenotype. So the genome-wide association study is the
widely-used method for investigating this inter-relationship between genotype and phenotype
on a genomic scale. Breast cancer has been a widely studied with the genome-wide association
study, and if we look at the genome-wide association study catalog, we see that the about 50 risk
alleles which can predict risk of breast cancer. A question you might ask is “How do we infer
the mechanism of these risk alleles?” Or, “How do these alleles lead to an increased
risk of breast cancer, and what’s the means by which we can understand how variation of
these loci has a functional consequence?” Using the eQTL framework, we treat gene expression
as a phenotype using gene expression profiling methods such as RNA-seq or microarray. We
can easily measure tens of thousands of features simultaneously, and this facilitates the investigation
of the functional consequences of genetic variance of these loci. So in our eQTL analysis consisted of three
parts: our germline genotype data, our tumor gene expression data, and our ER status data.
This was from 382 TCGA invasive breast cancer cases from Caucasian individuals. Our germline
SNP data came from a Affy 6.0 SNP array and our expression came from an Agilent 244K customary.
We took the about one million loci from the Affy 6.0 Array and we imputed it to that 8
million loci for the analysis. So getting from one million SNPs to 8 million
SNPs, like I said, is done using imputation wherein we estimate genotype for ungenotype
markers using a genotype reference panel, in this case the one thousand genomes was
the reference panel. So we used BEAGLE to infer haplotypes for
unrelated individuals and minimac to implant the actual imputation. That got us to about
16 million SNPs. We then took the 8 million most variant. So here’s a part of the first two principal
components of our genotype data, and our 382 cases came from the red cluster you can see
here. So we represented the interaction between
gene expression and genotype with a linear model with parameters for genotype and ER
status, which is our covariate. We use the R package MatrixEQTL to implement the eQTL
analysis. MatrixEQTL uses large matrix operations to optimize the testing for every SNP-transcript
pair, of which we had about 1.2 trillion, which is a lot. And we did — along with using
ER as a covariate, we also did eQTL detection and ER positive alone and ER negative alone. So of the about 8 million SNPs, we found that
about 140,000 of these were significant eQTL. We also found that none of the 51 breast cancer
risk alleles from the GWAS catalog were detected as eQTL. So we see here that there does not
seem to be an association between risk allele status and eQTL status. So another way we can think about our results
is if we think about this as a bipartite graph wherein each eQTL can be represented as a
loci pointing to a quantitative trait. And if we think about it this way we can compute
the in-degree of our quantitative traits, so how many loci per quantitative trait. The
other way of thinking about it is out-degree, so how many quantitative trait per loci. We
can also look at connected regions of the graph. So which quantity of traits are connected
to one or two or three SNPS, et cetera. So here we have the in-degree distribution
of our quantitative traits. We see most of the transcripts have one or two loci which
they interact with. And a small number of transcripts are interacting with a large number
of loci. Here’s the other side of that, these are the out-degree distributions of loci,
and we see the same sort of thing where a small number of loci are interacting with
a large number of transcripts. Here the quantitative traits with the highest
in-degree. We see some interesting stuff. Prolactin is known to play a role in breast
biology. MEN1 has been implicated in a variety of cancers. So another way we can sort of visualize this
is by taking a rolling mean of eQTLs across the genome, starting with genome one and going
all the way to — excuse me, chromosome one — and going all the way to the x chromosome. So the last thing I wanted to talk about were
these ER-dimorphic eQTLs. So like I said earlier we ran the eQTL analysis in ER positive alone
and ER negative alone, as well as with ER as a covariate. So we found 32 eQTLs with
an opposite sign of the interaction in the positive and the negative. And these are the
six genes which were — the transcripts from these eQTLs, and several of these seem to
have roles with apoptosis, which I think warrants further investigation. So, finally, of the 1.2 trillion SNP-transcript
transactions, about 375,000 eQTL were found. We found that risk allele status really does
not predict eQTL status, but the ER status can interact with the direction of eQTL. Finally,
it does seem that germline genotype can lend insight into breast cancer phenotype. I’d like to thank my boss, Andy Beck. And
from the Harvard School of Public Health, Aditi Hazra, Pete Kraft, John Quackenbush,
and Connie Chen. Since there’s plenty of time, I’ll take questions. [applause] Raju Kucherlapati:
Thank you. Questions for Nicholas? Is there any correlation, you know, between
these SNPs that were identified and when you do genomic DNA and SNPs? Nicholas Knoblauch:
I’m sorry? Raju Kucherlapati:
I mean, these are all obtained from expression profiling, right? Nicholas Knoblauch:
The SNPs are from — yeah, SNP genes, I think [spelled phonetically]. Raju Kucherlapati:
Yeah. Okay. Male Speaker:
So the eQTLs that are dimorphic between the ER positive and ER negative, were they generally
going — like, for example, IGF1 receptor, was that more highly expressed in the ER negative,
or was there like a negative correlation or positive correlation? Does it — Nicholas Knoblauch:
Right, between like the minor allele — Male Speaker:
Yeah, which direction was it — versus ER positive versus ER negative on the sets of
them. Were they consistent across or — Nicholas Knoblauch:
Right, so, it seemed that most of the apoptosis-related transcripts seems to be lower in the minor
allele, in the ER negative, I believe. And then the converse in the ER positive. Does
that make sense? Male Speaker:
Okay. All right. I think I got it. Okay, thanks. I’ll talk to you later. Raju Kucherlapati:
Matthew? Matthew Meyerson:
Sure, I was very curious about your result which is, at least naively thinking, surprising
that the germline risk alleles are not associated with eQTLs. Nicholas Knoblauch:
Right. Matthew Meyerson:
And you — do you think — does this suggest alternative hypotheses for the role of these
germline alleles in promoting cancer, other than being modulators in expression? Nicholas Knoblauch:
Right. So I think that it’s entirely possible that these SNPs may lead to cancer, but then
in cancer they do not predict any change of expression; that seems to be probably the
most likely result, but… Female Speaker:
I have a question – I want to understand, did you use adjacent normal tissue or the
tumor tissue to look at the gene expression? Nicholas Knoblauch:
Gene expression is from tumor tissue. Female Speaker:
So, these could be affected by the stage of the tumor. Did you do analysis by stage? Nicholas Knoblauch:
We didn’t do analysis by stage. We really only broke it down by ER status, really, to
keep sample size large. But, yeah, that certainly can play a role. Raju Kucherlapati:
One last question. Male Speaker:
Yes, in the ER — in the case of ER-associated eQTL, have you checked whether separating
premenopausal or post-menopausal cases could change things, because actually estrogen levels
vary before menopausal state, and that could affect gene expression. Nicholas Knoblauch:
Yeah, no, that — absolutely. We haven’t looked at anything really besides ER status, but
looking forward to gather a number of different covariants we could use. Raju Kucherlapati :
Thank you, thank you very much.

Leave a Reply

Your email address will not be published. Required fields are marked *