MD vs. Machine: Artificial intelligence in health care

MD vs. Machine: Artificial intelligence in health care

Good evening. I want to welcome all
of you who are here tonight here in Boston on
our campus and those of you who are watching from around
the world on our live stream. I’m happy to share with you that
the first two seminars of 2019, we had more than 20,000
people from around the world join our Longwood Seminar
classroom from Boston and from as far away as the
United Kingdom, South Korea, Pakistan, Egypt, Italy,
Brazil, and Australia. So to all of you, welcome. And I hope you’re
joining us again tonight. Tonight, our
Mini-Med School will feature artificial intelligence
and the tremendous potential it holds to revolutionize
health care. There is one remaining
seminar this year. Please join us on Tuesday,
April 30, for Why Sleep Matters. And we always have a great
attendance for our sleep program, so do come early. So now for a few
brief announcements. If there is anyone watching
tonight, a business or science leader who may be
with us, we want you to be aware of a
four-day executive education course called Inside the
Health Care Ecosystem. Zak Kohane, one of
tonight’s speakers will be among the faculty
teaching this course. Details can be found on
the web link on the screen. Now on the screen
you’ll see information related to obtaining
certificates of completion and professional
development points. So those of you who joined
us for the first two seminars and who are here
with us tonight, you’re entitled to
a certificate that says you completed
the Longwood Seminars. Our speakers will
be taking questions at the end of their
talk, so I ask you– if you’re in the audience,
you have a little card. Please pass it to a
member of our staff who will be circulating
up and down the aisle. If you’re watching
on the live stream, we want your questions as well. So please write your questions
in the comments section of Facebook and YouTube. And when you post
your question, we’d love to know where
you are viewing from. So please write the
country or the city from which you’re watching. And now please, silence
all electronic devices, but do not turn
them off because we want you to join our
Twitter conversation by using #HMSMiniMed. So please write your
comments and thoughts as you’re watching our program. It’s difficult, isn’t
it, to remember a time when technology and computers
did not exist and play a major role in our lives. My children never
lived in a world without personal computers. Technology has defined
their lives and ours. The impact of machine
learning and technology is dramatically transforming
our lives across many spheres, but importantly, never more than
in the practice of medicine. So how reliable are
computers in making decisions about our health? Looking into the future, what
are the many possibilities? How can our ability to rapidly
analyze vast amounts of data offer clinical tools to
diagnose disease, identify best treatment options, and
predict outcomes for patients? It has been said that
our intelligence is what makes us human, and
AI extends our humanity. We’re going to find out
more about that tonight. Tonight we’ll learn
more about the symbiosis of human and
machine intelligence from our expert Harvard faculty. Tonight we have with us
Brett Beaulieu-Jones, a research fellow in biomedical
informatics at Harvard Medical School. Katherine Liao is an
associate professor of medicine and assistant
professor of bioinformatics at Harvard Medical School,
associate physician, Division of Rheumatology, Immunology,
and Allergy at Brigham and Women’s Hospital,
and director of applied bioinformatics core
and the VA Boston Health Care System. But we’ll begin with
our moderator and one of the world’s foremost experts
on all things AI, Zak Kohane, who is the Marion V.
Nelson Professor and Chair of the Department of Biomedical
Informatics at Harvard Medical School. Please join me in welcoming
our expert faculty. Thank you. [APPLAUSE] Thank you, Gina. And I’m very excited to see
how many of you showed up to hear us talk about this. So we are privileged
to be living in an era where something
transformational, something genuinely new has
happened, and it’s happened in the span of my life. So when I was an MD-PhD student
getting my PhD in computer science, artificial
intelligence then meant we were going to
hand code using programming the style of diagnosis
and treatment selection that we saw doctors perform. What’s happened since,
and in the last 10 years, is we’ve learned how to
use the various techniques, various computer
science techniques, to use the data to
itself directly inform us what are the patterns
that are important. And so just as you
can now automatically search for cat
pictures on Facebook, you can automatically classify
pathology images of tumors and actually say whether it
looks like this kind of cancer or that kind of cancer
with performance that is as good and often better
than pathologists in the best academic health centers. So that’s a very exciting time. But the topic of my 20 minutes–
and I will try to get it done before 20 minutes because I’m
looking forward to having this moderated discussion
with all of you– what I’m going to
be talking about is the opportunity for new
medicines, for new treatments. Because I think in the end,
as patients, what we really are hoping for
are new treatments to help us suffer less and to
have the lives we want to have. So the most obvious
thing is to ask would be, is artificial intelligence
going to transform the way we develop drugs? And the answer is it may well. And so shown here on the
slide is one of my colleagues formerly from Stanford,
Daphne Koller, who is a professor
of computer science. And those of you
who are teachers should know that
when she was still a professor of computer
science at Stanford, she started the
Coursera online course behemoth that’s been very
successful and disruptive in its own way. But she’s now had several
other careers after that, and she’s now
leading a new startup called Insitro, which asks the
question– using a lot of data out of our health care system
and a lot of data out of animal studies and chemical
studies, can actually come up with new drugs? And we’ll see. We don’t know the
answer to it yet. And actually, that’s not going
to be the point of my talk because maybe this
process will succeed, but I can tell you that our
experience as a community is that drug development
is really, really hard, and often things that
make a lot of sense end up not working
in the clinic. But this may in fact
work, and we’ll see. But that’s not what I’m
here to talk to you about. I’m here to talk to you about
something quite different. And as always, in 2019, it’s
best to start with a story than with a bunch of numbers. Here’s a story. It’s a six-year-old
child who was doing fine. And then he was no longer
walking and no longer talking. He had been walking and
talking, and then he stops. And saw many doctors. No answer. And so he was
referred to a network that I have the privilege being
part of, of the Undiagnosed Disease Network, where we take
patients who are undiagnosed, we do whole genome
sequencing on them. We look at every single one
of the three billion letters in their genome,
figure out what’s different from
reference human beings, and then refer this
patient to the right expert throughout the United States. Shown here are only
seven academic centers. Currently includes 12
academic health centers. And through this network,
we referred this patient, we did the analysis,
and we found that this patient had a mutation
in a gene that has an almost unpronounceable name– GTP cyclohydrolase 1. I had never heard of it
until I saw this case. But what does this gene do? It takes a bunch of
chemicals and turns them into neurotransmitters. The chemicals allow your
neurons to talk to one another and make your brain work. And because this is
deficient and is not making enough neurotransmitters
from the pre-existing chemicals in your brain, this child
was really losing milestones. Not only not
progressing– losing. And what’s amazing is once
we knew what the cause was, we could just give
this child a bunch of compounds that get
easily transformed into these neurotransmitters
like L-DOPA, folinic acid, and 5-hydroxytryptophan. And what’s so amazing
is that within months of starting this therapy,
which is just things to eat, this child started
walking and talking again. That’s amazing to me. And let’s think about
what really happened here. We combed through billions
of bases, went through thou– what am I talking about? Millions of records of what
diseases are associated with which mutation, something
that no matter how ambitious you are in medical school, you
will never be able to learn. Sometimes hard to get us doctors
to be appropriately humble. But the point is,
this allowed us to zoom in onto that mutation
and treat this child. There’s a couple of other
interesting things that I found, which is that we
published an article in the New England Journal of Medicine
about our network, Undiagnosed Disease Network,
and it turns out that a third of the
patients already came in having their
genome sequenced. So it’s not the data. It’s what you do with it. And having the right
programs to analyze them is the augmented intelligence,
the artificial intelligence that will help us
be better doctors. So that’s one view of how
artificial intelligence will allow us to create
new treatments simply by identifying what’s wrong
by sifting through millions of facts and saying, that’s
what’s wrong with this patient, and that will make clear
what the treatment should be. But there are other things that
can be done for new treatments. It’s important to say
for those of you who are with me in Boston, as
the sun is finally coming out after this long winter,
we’re going to be out and showing a lot of
skin, which we probably shouldn’t be doing because
it actually allows the sun to damage our skin and cause
what’s becoming a growing problem of melanoma, skin
cancer that can be deadly if you don’t catch it. But it turns out the same
artificial intelligence techniques that I described
before that allows you to find the cat in a
huge pile of images can also be used to look at moles or
spots on your skin and say, that’s not a mole, that’s a
melanoma– that’s not a birth spot, that’s a melanoma. And why is that important? Because a scientist at
Stanford, using images that you can just use
with your smartphone, whether it’s your
Android or your iPhone, can allow you to take a
picture of these spots and then immediately
have a diagnosis of whether this is something
that you need to get taken out. And guess what? A, if you take it
out when it’s still superficial, much different
history of the clinical course than if you let it stay. And on average, people who have
been diagnosed with melanoma have known about this
spot at least a year. But it takes time to
be seen by a doctor, even those of us
who are our doctors have a tough time getting seen
by doctors in a timely way. So think about the difference
it makes for so-called secondary prevention, which is– primary prevention
would be sunblock to prevent the cancer from
happening in the first place. Secondary prevention
is identifying the mole as being malignant and therefore
should be removed early before it becomes metastatic. So there again,
just by using this, we’re jump-starting the way
that AI can not only augment doctors– I want to point out
to you a theme that will be familiar to those
of you who have smartphones. Makes you, the patient,
part of the solution. Because waiting for
doctors to diagnose us is probably the wrong move. Doctors are overtaxed
in time and bureaucracy, and they’re think about
many, many things. But you are thinking about
yourself, hopefully, more than they are. And so if we give you the
tools so that you can actually decide in a much
more acute way, I’ve got to see a doctor now
because this thing says I have potentially cancer,
then we’re actually making a new treatment. I’m going to start wrapping
up by telling you a story. It’s a lot of words here. Don’t forget–
don’t feel like you have to read the words because
I’ll tell you the story. This is a story of a
friend of mine who– well, the son of
a friend of mine, who’s actually a professor
here at Harvard Medical School. His child was diagnosed
at age 3 and 10 months, almost four years of age,
with something called colitis. This is inflammation
of your gut. And you determine that by
putting a tube up the rectum, look around, see
inflamed tissues. You take a piece of the
tissue lining your colon, you look at it
under a microscope, and say, wow that looks
like inflammation. That is inflammatory
bowel disease. And there’s two types of
inflammatory bowel disease, Crohn’s disease and
ulcerative colitis. And I will spare you the
details out of interest of time, but I can tell you
that this child did great on very mild
anti-inflammatory agents for 10 years until puberty. And then in puberty,
as often happens with these kids, the
disease flared up. And this child, who was
doing fine until that point, started pooping every hour. And when you poop every
hour, you’re not sleeping. Therefore, you’re
not going to school. And so my friend’s kid was
just no longer going to school, lying in bed, no energy,
pooping every hour, in pain. And every medication
that we used that is– and here we are in the middle
of the best academic health center. Forgive me for those of you who
are at other academic health centers. But potentially the best
academic health center, and nothing worked. Not steroids. Not the antibiotics. Not the first-generation
monoclonal antibodies. Not the second-generation
monoclonal antibodies. No expense spared. Nothing worked. And everybody was
pushing him and his wife to go for something
which was reasonable, which is to get his colon
removed, so-called colectomy. Now, for those of you
who are as old as I am, you might not remember how
bad it was to be a teenager, but let me remind you. It’s tough to be a teenager. And to be 14 years old and
then have surgery and then have a bag with
stool in it at least even for a few months is really,
really not a great thing. And even after you
remove the colon, sometimes there’s a little
bit of inflammation left, so you still need
to be on the drugs. So it’s not an ideal situation. So we’re pushing it off. But eventually,
everybody convinced us that the surgery had to be done. So we’re five weeks
away from surgery. And so my friend asked me- Zak– so my name is Isaac Kohane,
but my nickname is Zak. He said, Zak, what
about a crazy analysis that your graduate students
showed me the other day? And what it was– and these are– I’m showing
the pictures of the students and postdocs who did it,
none of which have an MD. And that’s very important. All have PhDs in
computer science. These individuals,
we took a bunch– we had taken a bunch of
samples from patients, and we’d measured
which genes were up or down in these patients who
presented with bowel problems. And what we found was that
there was one subgroup that ended up being healthy. And we show them here in red. And then there was
another subgroup that had ended up having
inflammatory bowel disease, shown here by
the blue and green dots. So the point is, just by looking
at which genes were up or down, we could tell that they
had inflammatory bowel disease without looking
under the microscope as regular doctors had to do. That’s not the interesting part. Here’s the interesting
and somewhat crazy thing we did that my
friend had asked me about. We said, what if we divide
this patient population in two and ask ourselves,
which drugs can push the genes to make them
much more like the healthy kids? In other words,
the genes that are high in the gut of
these unhealthy kids, can we make them go down? And the genes that are
down, can we push them up? And so we went through
a large database of drugs that are
known to affect genes, and we were able to show, sure
enough, that the drugs that are known– like azathioprine–
that are known to work for inflammatory
bowel disease, do seem to push these kids
who are sick towards healthy. But that was just an
experiment, a talk that we gave. But he, my friend, asked
me to do this for his kid. So we had a biopsy from when
he got flared up from his gut, and we did this analysis. And then these
postdocs and students did the analysis I described,
and they came to me and they said, Zak, the top drug
that works best for this kid is indirubin. I said, indirubin? What the heck is that? I never learned about
that in medical school. So I did what you should do
and what I tell students to do, is use Google. And so I looked it
up, and it turns out indirubin is part of a purple
thing called indigo which is made by bacteria that,
when they chew through things in your gut– food, for example– they
make this purple byproduct that’s available as a
supplement over in a store. And forgive me
those of you who are Chinese speaking
because I’m going to massacre pronunciation. It’s also known in
Chinese as Qing Dai. And so then I did
the next thing that I tell medical
students to do, which is look up if there’s been any
studies using this drug, Qing Dai or indigo, for
ulcerative colitis. But I warned them that you can
always find in some journal some good effect for
some supplement, so not to put a lot of weight on it. So sure enough, we found
a journal that’s in china. And this is–
forgive me if you’ve published in this journal. It’s a third-tier journal. And they had found that there
was a good response to therapy in these kids, in these
individuals with Qing Dai. So I call him my friend, and I
thought he was going tell me, when I said indigo, he was
going to say the same thing as I did– what the heck is indigo? Instead, he said Zak, that’s
really interesting, because he had been asking around the world
about what to do with his kid, and there was a group
in Israel, in addition to the standard
Western medicine, was giving indigo
as a supplement to every single patient. But he had dismissed it. Why was he going to give
a supplement to his kid? He’s a Harvard trained doctor. He’s not going to
believe in supplements. But he said, maybe
we should actually try it now that your
analysis suggests that. And so I said, OK, let’s do it. He says, how do we
get good indigo? Because if you
don’t know already, any supplement, depending
where you get it, it can be either 100% that
compound or 0% that compound. So I said, just get the Israeli
clinic to FedEx it to you. So he did it. And the amazing
thing that happened is within two weeks,
this child who had been pooping
every hour, went down pooping three or
four times a day. And that was three years ago. Still no colectomy. He’s doing great. If we had not done this,
he would be minus a colon and God knows what else. And I want to point out,
this is not a party trick that any doc could do. It was three graduate students
using these AI techniques, combing through these
large databases of drugs affecting genes that actually
came up with this result. And so when I tell– this is
part of a longer story which I can’t bore you with where
I talk about whether or not people need an MD degree
to advance medical science. But punchline is– no. [LAUGHTER] Speaking about
treatments, I just want to say that, just
in case you’re a surgeon, you should not feel too
self-assured that you’re not going to be dealt out of the
game as well, or at least not have a useful assistant. There’s now already
some studies showing– this is, again,
just in pigs– where suturing done on the
gut of these pigs using artificial intelligence
to identify where the gap is in the gut and sewing it shows
that, in fact, these things can, as you’d expect,
be much more even in the spacing
between the stitches and also have much
more tighter seals. This is basically
pushing water through and seeing how much it leaks. It does much, much better. And you know what? We’ve only started. This is only going
to get better. And so even without
developing new drugs, with AI, we’re going
to be able to find the right diagnosis for you. We’re going to be able to find
which of our existing drugs is the right drug for you. We’re going to be
able to improve the performance of
doctors, like surgeons, but for many other tasks
that doctors can do, but we can make them better. We can make them be the
best doctor they can be. And with that,
thank you very much. We go on to our next poll. [APPLAUSE] Good evening. I’m Brett Beaulieu-Jones. I’m actually a postdoc
in Zak’s group, so it’s a little bit
strange to have your boss and your mentor open for you. [LAUGHTER] Totally appropriate. So I get to play a little
bit of the bad cop. But first, I want to
start out by saying I truly believe in the
potential for AI for medicine. I want to echo all the
sentiments that Zak laid forth. We will be able to
figure out what’s working in medicine,
what’s not working, find things where we’re
missing treatments and need better treatments. And there are patients who
are being poorly treated now. As well as areas where
we’re wasting resources, we’re spending money on
ineffective treatments, among a huge number
of other things. And then identifying
patients who are the best fit for specific
drugs and many other questions. In some of my work, we did some
deep learning on ALS patients. And so this was across 23
different clinical trials done all over the world, so with a
wide variety of different data sets, different data
elements collected. And in this, we are able
to consistently identify a cluster at the top where
the darkest red indicate that people who had
the shortest survival. This cluster was clinically
interesting to some of our collaborators,
and they’re now continuing to look for
patients among this cluster. So I do want to start by
saying I truly believe in AI and in some of
the things that it can do before diving into one
of the key issues with it. So there’s all of
this promise, but we do have to remember that it
is driven by historical data. It’s driven by the
current practices. Machine learning learns from
the actions of people today. It’s the things that
have happened over years. And so if we are learning
from people who are biased or systems that are biased,
the machine learning model is not going to be
able to magically get rid of those biases. It may even have the ability
to exacerbate these biases, because if we are now taking
something that currently exists, predicting
it in the future and making decisions
based off of this, we may just continue to deviate
further and further from what is right. So as a example of
this to lay this out, we have two groups
of people here. There are green people
and there are blue people. And they happen to smoke a lot. For whatever reason,
they’re still smoking. Because of this, they develop
lung cancer, and many of them develop lung cancer. Unfortunately for
the green people, money is the same
color as them, and they have trouble seeing it and
they drop it on the ground. Blue people are able to
hold onto their money, and because of this are
much richer on average. So because of this,
they’re able to afford a new treatment that works well
and can actually treat them. And when we do this, and if we
train a model on this scenario, the question is, what
is the model learning? And one thing that
it might learn is that green people
can’t actually receive this treatment. It will see that because
they can’t afford it, that they never actually
receive the treatment. And this will mean
that it will never recommend the treatment
for green people, and it will never know
whether it works or not. And it will create this cycle
where we won’t actually know the answer to that question. If we want to get a little
bit more realistic here and take a population
of people where there are some green people
who have better eyes and can see their
money and hold onto it, and they all receive a drug that
works in about 20% of people– not all of them. But 75 blue people
receive the drug, and three green people
receive the drug, and it works in
about 20% of people. There’s still greater
than a 50% chance that it never works in this
population of green people. So under this
situation, we might learn something even worse. The model might
learn that the drug doesn’t work in green people. We might be biased by the
small sample, where the machine learning model is never seeing a
successful case because there’s such a small sample of
people who are actually receiving the drug. And this could be even worse
than never recommending it because it might say that
it’s a bad recommendation. So the question is whether
this is a realistic situation. It’s a toy example
that we put together to illustrate this point. And we know that
people aren’t green and people don’t
carry cash anymore. But if we start to
look at the real world and some actual cases,
we can see differences among things such as insurance. Insurance can be the gateway
to receiving treatment. It can give you– it
can really lay out what options you can have. It can lead to disparity
of health care. It will determine what things
are realistic treatment options for you. A couple of the key things that
I’d like to point out here, first of all, is that
among the Medicaid and self-care populations,
in 200 million inpatient admissions,
people who self-identified as black were twice as
likely to have Medicaid or self-insurance,
self-insurance meaning they don’t have insurance. They’re paying
for it themselves. These are within
these two categories where this is one example,
but we can’t in this database even look at other racial
groups because in areas of the country,
the numbers are so low that if you
look at that group, it risks privacy
for the individuals. There’s a risk that you could
actually re-identify people within that population. So there’s a lot of
groups in a data set as big as this is that we may
not even be able to study. So what does this translate to? One of the things that
is a shocking statistic was something that
the CDC put together between 1987 and 2014, which
showed that black women had mortality during pregnancy
at more than three times the rate of white women. And when we take
this into research and start to look at
other areas and try to get back to
different things that are going to be training these
artificial intelligence models, one example are in
genetic studies. And there’s two main takeaways
I want to make from this figure that I know can be a
little bit hard to see. But the first is– first is that the European
population represents about 80% of the genetic tests that have
been performed and associated and are indexed for
researchers to work with. And if we look at potentially
the most interesting genetic group,
the African group, because of the long history
in Africa and the way that different migration
patterns happened, it only represents 2% of
the genetic tests that are available for researchers. Similarly, if we look at
clinical trial participation by race, the USFDA reports
that 86% of clinical trial participants are white. So what does this tell us? It tells us that we
have a pretty good idea of whether things are working or
not among the white population. And among other populations,
we have much smaller sample counts. So all of a sudden, that group
of three green people receiving a drug becomes a lot
more realistic as we have this smaller sample
counts where we may not be able to tell if a
drug is working or not among that population. What does this lead
to in the real world? Here’s one example. So the government of New Zealand
put in place a computer vision algorithm to recognize
people’s faces to determine whether their
pictures were adequate quality for passport photos. This man uploaded a photo to
it and gets a message saying that his eyes are closed. So if this was you, how
does it make you feel? And this is the
case where, likely– it’s New Zealand. Again, there’s probably a bias
in the training population of the algorithm,
and it just doesn’t work for this particular case. Again another example
is an algorithm that was developed
by a private company to predict the risk of
recidivism, the risk that a criminal would re-offend
and commit another crime after ever leaving jail. If we look at this, it sounds
like a really noble goal. We know that humans are biased. We know that judges are biased. We know that there’s different
people in different places. And so maybe we can take
it all, turn it into math, use data to power
our decisions, and we can take out the human element. It sounds like an
incredibly noble goal. But when we look
at the algorithm, we start to notice some
interesting trends. Among the people who
do not re-offend, if we look at the
predicted risk, we find that these are all
people who did not re-offend, and black defendants were
given a risk score of double what white defendants were. If we look at this
from another angle and take the group
that were deemed to be low risk of
re-offending, black defendants, again, were about half. So this is looking at it from
the other angle, where now they re-offended about half the
rate in the same risk group as white defendants. So what can be done? So we need to start
to think about, how can we fix some
of these problems? How can we recognize
bias and work on it to illuminate the issues? And so the easiest
solution would be, let’s remove race
from the classifier. Let’s not pass race
in as a variable. This is something that sounds
like a very easy solution to this question. This was something
that has been tried. A famous example of
this is Amazon has a– had an algorithm to
score job applicants and to create scores for them. And as they were using this, one
of the things that they noticed is it consistently ranked
male applicants higher than female applicants. So their answer to that was,
let’s get rid of genders from being passed in as inputs. And what they then found
was that all of a sudden, the algorithm was
ranking people who used words such as “executed”
and “performed” in their CVs or resumes and
ranking them higher. And when you look
at it, those terms were used much more
frequently by men than women. And so it was essentially
getting around the fact that you were no longer passing
gender and learning that from a different way. And a lot of this was built
up because, obviously, there are gender inequality
issues in the tech industry. And if you’re training it on
historical data where there are more men than
women, you continue to see this pattern
over and over again. So where do we start? We have to think
about AI machine learning from
framing the problem. We have to think
about it like, if we are talking to a salesperson
and giving them a task, and they have two groups of
people they could possibly sell to, and we tell them
that if they sell to one group they’re going to double
the commission of selling to the other group, what’s
that salesperson going to do? They’re going to immediately
sell to the group where they get
double the commission and fully optimize to that. They’ll completely ignore
the other group, no matter how important it is
to your business. And we have to think
about AI algorithms as if they are that salesperson. They’re going to solve the task
that you put in front of it. Unfortunately, it can be
really hard to define that task to be a holistic, wide
range view of things where you’re considering
all the other possibilities. In this case, it could be
trying to eliminate bias. It can be really hard to
mathematically frame bias. Another thing that
we need to look at is we need to ensure that
the population that something is being used on actually
matches the training population. So this is the example of the
New Zealand passport image. But if we are
looking at a training population and a
real population here, and we say that these
are two distributions, and these actual graphs don’t
mean anything other than to say they’re different groups– And we look at it and we
train on this red group, and then we see a person from
the real population who is otherwise very average–
they’re the right in the middle of the actual population– and we train on this,
would we really expect the algorithm to work? Would we expect
the model to work? And so this starts
at the basis of, where are we getting
the training data from? And so one thing that I’d like
to bring that back of telling all of these– and I don’t mean to
fear-monger because I do think AI can actually help
with a lot of this stuff. So one of the things you
can do is because we can now look at this, we
can mathematically model bias in these systems. We can say, what happens if we
change the gender of someone? What if we change
the race of somebody? What if we change
different factors and we look at the
output of a model to see what is actually driving
the AI, the machine learning model’s decision? The other thing
that we need to do is eliminating bias
is going to require a much more inclusive scientific
and medical community. It’s going to
require that we make sure that the studies
that we do are achieving a more diverse group. And this is something
that is very easy to criticize but in
practice can be very hard, because scientists are
looking for the smallest sample size that they can get to
determine whether an effect is real or not. And the best way to
do that is to get people who are very
similar to each other, because then you’re
measuring one effect. You don’t have other
potential effects going on. And so I see the need
to counter biases as potentially a tool
for us all to argue for more inclusive, larger
studies where we can look at some of these factors. And so with that, I would
to thank you all for coming. I do want to say– [APPLAUSE] Really quickly,
there are two things that I think, as a researcher,
you can really appreciate. And the first is
that we would hope to actually build
something or come to some conclusion that actually
has an impact in a patient’s life. And the other is
that people actually care about what you do. So something like
this truly does mean a lot coming from
this side, so thank you. [APPLAUSE] Slides going to switch. Just waiting for the
slides to come on. Well, good evening everyone. My name is Kat Liao. I’m actually a rheumatologist
at Brigham and Women’s Hospital. And I actually see patients,
but I also, almost a– over a decade ago
started working with Zak. And since then, we’ve
been doing a lot of work on clinical applications of AI. So I might be taking
a slightly deeper dive into the nuts and bolts of what
we’re doing in these research projects. So hopefully I’ll
keep you all awake. So let’s see. So I’d actually like to
start with a cab drive story. So I called a cab because I
needed a ride to South Station last month. And I got in the cab,
and I got a chatty cabby. He says, what do you do? And I said, well, I’m a
doctor, and I also do research. And he said, well
you know, actually, just didn’t have a great
experience with one of the hospitals in Boston. And so what happened is
he had a recent cancer diagnosis made on biopsy. And in the first
hospital, he was told he had a pretty severe
high-grade cancer on biopsy when they looked at his cells. And he, like everyone,
rightfully so, went to another hospital
and got a second opinion. And there they said,
you have moderate-grade. You definitely have a
cancer, but you may only need six weeks of
chemotherapy and not the 12 weeks of chemotherapy
and radiation that was recommended
by the first hospital. And so he actually went back
to both institutions and said, hey, there is this
difference of opinion. And so the pathologists,
the doctors that review the slides from
the biopsy, re-reviewed it. They actually had somebody
else review the slides, and they came to the same
difference in opinion. And he asked me, how
could this happen? How could something
like this happen? In my head, I was thinking, it
actually happens all the time. And that’s because, as many
of you are probably aware, there’s a lot of gray
areas in clinical medicine. And so what I’m showing you
here is a complete cartoon, but of cells. This is a normal cell, and
this would be an abnormal cell that you would see
in high-grade cancer. But oftentimes, people
have a lot of things in between– gray area. So you might say this is normal. This is mildly abnormal,
moderately abnormal, and highly abnormal. And I don’t know
exactly what happened. I didn’t get involved
in that case. But I could see how he could
have a difference in opinion because things like this
happen all the time. So let’s say the cab driver,
he had a biopsy done, they looked at the cells, and it
was 50/50, right in the middle. So those physicians,
those pathologists, have to pick one or the other. And that has to do with
practice or opinion when you don’t
have a lot of data. And in fact, in many
situations, in this gray zone, there is no right answer. The reason there’s
a gray zone is because we don’t know
what the best answer is. But from this story, you
can tell the implications for this patient are very
different based on how the data were interpreted. So one hospital said,
you need 12 weeks of chemotherapy and
radiation, and the other said, you need 6 weeks. And he said, 12 weeks would
put me out of the job. I’d have such a hard time. It would really just affect
my life in such a big way, and I can’t believe it
can be so different. And so ultimately, the cab
driver did undergo treatment at hospital two. He had chemotherapy
for six weeks. He was doing very well. But in reality, we
actually need more time to know if this was
actually adequate therapy. So I want you to hold
this story in your mind, and this theme will come up
again, themes from this story, when we talk about how
we might be applying AI in clinical medicine. And so why AI for
clinical medicine? To say it’s very exciting time. You heard from Zak and Brett
about all these technologies that are changing. For me as a physician, I started
training with paper charts. So a classic case of a
72-year-old man comes into the hospital
with his daughter, and his daughter’s
like, I think– he’s confused. He can’t tell us anything. And the daughter
says, I think he might have had a
stroke three years ago and was admitted
at this hospital. So what that meant when
I was an intern, meaning I go down to the basement. I request the charts. I get a stack this high. And I’m trying to flip
through it to find out where in this past three to five
years was he admitted and why. And so as you can tell,
that’s very labor intensive. Just for one patient, it’s very
hard to recreate that history and synthesize the data. Then, if you take
it a step further, on the research
side, when you’re trying to learn about
relationships between diseases or how a treatment
may impact an outcome or may be good to
prevent stroke, you have to do
these chart reviews for thousands of patients. And in fact, before
now, we literally had teams of people reviewing
stacks and stacks of paper charts to figure out who had
a stroke, who had high blood pressure, who is on what drug to
figure out these relationships. Now, with electronic
health data, I might say that we
almost have too much data. We’re drowning in the data
dell where we actually can’t find the information we need. The good thing is it’s
in there somewhere. And obviously, this
is why EHRs are here. It’s the opportunity to improve
the efficiency of health care. But as physicians,
now when someone comes into the hospital,
if someone says, it’s all on the computer,
and I said, I know, but I can’t find it. And so our goal now is, how
do we get this information out of there? And particularly for medicine,
when we think about research, there’s a lot of information
for us to understand, again, the relationship
between diseases. What treatments are effective? And it really has enabled us
to do these large population studies and change the way
and the types of questions we can ask. But before we can do that,
we have to figure out who has what disease. And so Brett and Zak both went
through some applications of AI in medicine. And what I’m going
to focus on is the one I think as physicians
we think about the most, is how can AI help us
make the diagnosis? And assist in making
the diagnosis, or actually predict that someone
is going to get the disease? And what I want to hammer home
is that before we can do that, we have to figure out,
in all these data, how do we define who
has what disease? And I see the research studies–
this is the realm where I live– as a first step. And in fact, the clinical
Electronic Health Record data has enabled us to try
to ask this question. You don’t want to test
AI on the patient. You don’t want you to be the
test subject in the clinic to see if AI is working. But the clinical EHR
data gets you as close as you can get to the patient
without actually testing it on the patient or
ourselves, and that’s because this is all the
data that’s generated as part of clinical care. And so this phenotyping, or
knowing who has what disease, is really the foundation
for useful applications in making the diagnosis
as well as all the studies we do asking about– does a treatment work? What are the side effects? What kinds of– does smoking
increase risk of lung cancer? Which we know it does. So why is making the
diagnosis so hard to do, and why is it so
hard to teach AI? So phenotypes are
actually a spectrum. So phenotypes themselves
are measurable attributes. And so they can be
physical characteristics, such as eye color. Or it can be certain
diseases, such as stroke and rheumatoid arthritis. So for stroke, someone can have
a small blockage of an artery and have damage of a few brain
cells, have a facial droop, get to the hospital in time, get
treatment, completely recover. That’s a stroke. Another patient with
a stroke is someone who had a blockage
of a major artery, massive damage to
the brain cells, and complete paralysis
on the left side. That’s also a stroke. So I’m a rheumatologist. Many of my patients
have a condition called rheumatoid
arthritis, the most common inflammatory joint disease. There is a blood test
that’s associated with rheumatoid arthritis
called rheumatoid factor. So someone with positive
rheumatoid factor, two swollen joints,
and about an hour of morning stiffness,
that’s rheumatoid arthritis. Another case, on
the extreme, you can have negative blood
tests of rheumatoid factor, have five swollen
joints, and complete destruction of the joints. That’s also
rheumatoid arthritis. So these are– as you
can tell, the spectrum comes in many
different combinations and characteristics. And it’s hard to– as humans,
I think our intuition– we can integrate all these data and
say, this person has a stroke and this person has RA. But how do you teach
a machine that? Do you have to give it all
the different combinations? It’s very hard to explain that. The other challenge is,
where do you do that cut? I showed you the
spectrum of the cells, and you have to make a cut
to say, this is abnormal, and this is normal. In every disease, you
have the spectrum, and somebody has
to decide at what point that you say
someone has a disease and needs this treatment versus
they don’t have the disease and perhaps you
don’t need treatment. And so this is where
I wanted to just make the point that artificial
intelligence is very different from
human intelligence. Working with this
kind of technology, it’s very different, and the
goals are very different. So in medicine
right now, at least in terms of trying to
understand the diagnoses, we’ve been using something
called machine learning. And I’m sure many
of you probably– I think they use
this word in ads now. When I’m driving to work
listening to the radio, they say, machine learning
for this and that. This is a technology that we’ve
been using to try to see– can this machine learning,
artificial intelligence, help us to make better
diagnoses and more accurate diagnoses sooner? And as Brett and Zak mentioned,
it requires data to train. So you can’t just give it
data and say, OK, intuit. Like a human, you
can give someone data and say, OK, figure
out who has RA. You have to say who you think
has rheumatoid arthritis and have it train on that. And I’m actually going
to go through some of the gory details of
this in the next slide. So I’m going to give you a
real scenario that we went through almost a decade ago– over a decade ago now. And that was Zak had– he was very visionary. He said, OK, we’ve got all
these Electronic Health Records coming on. There’s all this data in there. We should be using
it for research. And so he got a
bunch of us together, clinical researchers
such as myself, but also bioinformaticians,
biostatisticians, people working in natural
language processing. Said, there’s all this data. Now figure out how to
do something with it. And so at the time, we
had seven million patients in Electronic Health Records. And as a researcher, I was
interested to know, who has– I wanted to study
rheumatoid arthritis, so the first step was trying to
identify who has the disease. In the general
population, it’s 1%. So it literally is like looking
for a needle in a haystack. And so those of you who
have some familiarity with the medical field,
you’re probably saying, well, why don’t you just use a
diagnosis billing codes, because they’re called
diagnosis codes? And so what we did is we
started and we randomly selected 100 patients with
at least one code for RA. And what we found– we had
three rheumatologists review the charts, and we found out
only 19 of the 100 actually had RA. So you can’t do any study
with this if you’re only 20% correct. I just want to say,
it’s not because people are miscoding on purpose. The way billing works is
when someone comes in, when you go in to
see a physician, something has to be billed. You’re ruled out. You’re being assessed for x. You’re being assessed for heart
disease, for RA, for stroke. It doesn’t mean you have it,
but you need that code to say, this is what you’re
being worked up for. So then we said, OK, well
let’s do three codes. That got us to about 50%. So it’s almost a coin
toss at this point. And you imagine,
if you’re trying to do a study understanding
the association between whether a treatment
is effective and the outcome– you’re trying to understand if
it’s effective for preventing, like let’s say a stroke,
and you’re only 50% correct, you’ll never see a signal. The other thing I want to point
out here is in this exercise, we took 100 random patients,
and what we were doing is we were slicing and dicing. We were saying, OK, we
have codes and medications, and how can you get
some kind of algorithm or very simple algorithm
that’s accurate in defining the disease? And this is where things
were over a decade ago in how we were
defining conditions for studies in large data sets. And you’re limited to
maybe about 5 to 10, because after that, there’s
too many combinations for you to manage. So let’s talk about how machine
learning might help us here. And so I’m showing you
one data set first. This is a very small
data set of data you can typically pull out of
the Electronic Health Records. You have an ID, age, gender,
diagnosis code, and a lab. On the right side
here, I have what we would call a gold standard. This is what a
physician we review the charts of these
eight patients and say, you have or have
not this disease. So for this particular
group of eight patients, there’s only one patient. You can’t train on this. This is not something
that machine learning can help you with because
there’s not enough data. And as Brett was
mentioning earlier with the clinical trials
data and the people who were being included
in the studies, if you don’t have
enough people, you don’t have the
right training sets. This is a terrible training set. So let’s go to the next one. So now we have
another training set. Eight people. 50% have this disease. And if you look closely, you
might say, OK, most of these are women. So this disease is– let’s say this is
rheumatoid arthritis, which is what I modeled it after. It’s mostly women. Most people have the
diagnosis code in this lab, we’ll say it’s rheumatoid
factor, is roughly above 30, you have a good chance of this
person having the disease. So we as humans can handle this. There’s literally four
variables on here. But you are limited in how well
you can define a disease when you only have four variables. Now, the beauty
of the EHR is now you have thousands if not–
depends on what you use. You can have millions if
you include the genetics. And so let’s say a typical
training set has 200 patients. So you have 200 rows. But now you have, on the
columns, 500 to 1,000 columns. And so even if you had
people reviewing the charts– because I could– the physicians can say– the clinical experts
can say, reading the notes, who has what
disease, because that’s part of the training. But we can’t see the pattern. There’s just too
much data in there. And this is really
where machine learning has been very helpful to us. We just can’t process
all that data. So I don’t have to
spend a lot of time on this slide, why getting
the phenotypes right is important,
especially when you’re going to use it in the clinic. So there’s no question
that misdiagnosis in clinic has just tremendous
impact on the patient. But misclassification
and research is also really detrimental. So if you don’t
get it right, you don’t see the relationships. Again, I use the
example of stroke. If you’re looking at the
relationship between blood pressure– high blood pressure
we know is related to stroke. But if you can only classify
stroke right 50% of the time, you’re just seeing noise. You’re not going to
see that association. You’re not going
to know that you need to target blood
pressure to reduce the risk of future stroke. And so that really–
this need to get either the diagnosis or
the phenotype correct, is really important
because it’s what we call it powers the study. Your study has no power
to see any relationships if the data are too noisy. And I know this has already come
up, that the algorithms really rely on these training sets. The training sets have to
reflect the population you’re going to be running it on. And it also relies
on the reviewers. Those gold standards– when I
talked about this chart review here, the machine
is trying to mimic, is trying to predict what
you tell it to predict. It’s not going to
go beyond that. There’s no intuition there. So I wanted to share a little
bit of what we learned in terms of using machine learning
in clinical research using the Electronic
Health Record data. So I’m not going to go
into this in detail. This is probably version
12 of what we’ve worked on in trying to start
with the EMR data and getting to this probability
or this phenotype yes/no. And what I want to point
to in the center here is that we found that machine
learning methods have actually been very useful
and very well suited to dealing with the
complexity of the EHR data and helping us to accurately
define the disease. And that at the center here,
you have the gold standard. So we still have about– you start with a set
of 200 to 400 patients where you pull out hundreds
of variables or columns. But you review the charts on
these patients and you train. You have the machine train
on this gold standard and find the pattern. Then you take that
mathematical model developed based on
that pattern and run it on the EMR of now
millions of patients. And that’s how you get this
yes/no, who has what disease. But right now, it’s
for researchers only. And that’s because
there a lot of things that we can’t study using data. There are lots of things
going on in the clinic that are not captured in the
Electronic Health Record data. So there are some
challenges to translating AI into the clinical setting. I know there are many
people working on this now. We already talked
about the training set. Who are going to be
the clinical experts? Who’s going to define
the gold standard. And adapting to new diagnoses,
new inputs, and new therapies. Brett mentioned you’re
training these algorithms at one point in time. How do you know it’s going to
be useful 10 years from now? How do you reassess it? When do you retrain it? And the stakes vary
very differently depending on the situation. Are you using it
for screening, where you’re then going to have– it’s going to be
very sensitive, it’s going to capture anyone
who possibly has a disease, and then you confirm
it with a physician? Or is it going to be the
actual diagnostic tool? And last but not
least, as a clinician, I think a lot about, how are
we going to use these tools? Ultimately, the
clinical team is going to be responsible for the
final diagnosis and treatment. And when we make
that decision, it’s not based simply on an answer. It’s not like, you
have this disease. It’s– you have this condition. Here are the treatments. But what’s all the
other stuff going on? What are your other
medical issues? What are your other
social factors? Can you tolerate this
type of chemotherapy? So those kinds of almost
more intuitive or I would say data that aren’t
captured in the EHR are very important in making
decision for treatment. And so this theme
I think has come up is that I think that the
research that we’re doing, the research on the
clinical EHR data, may mirror how we might move
into the clinical realm. So what I showed you,
this is very much what we call a semi-supervisor,
an automated pipeline where you move through processes. And I showed you
that machine learning and artificial intelligence
is at the center. But what we found,
taking this algorithm, implementing it in
multiple other institutions and across 20 to 30 diseases
now, you need a check. You need a human check. And each of these stars is areas
where things went very wrong because of some blip
in the data, something that the machine’s not going
to know intuitively that’s not supposed to be there. And so each of
these steps is where we’ve built in human checks. And right here, this is a check
to say, where do we threshold? Where did we say
someone has a disease or doesn’t have the disease? And I do, I strongly
believe that we’re going to need a
similar paradigm when this AI comes into the clinic. And so in summary, I
hope I’ve demonstrated how it could be a
powerful tool to assist us in clinical medicine, where
it’s not necessarily replacing a lot of the things
we do, but it’s able to do other things such
as integrate large volumes of information that we
simply can’t process. But it is limited
by the training data and how good the reviewers are. But ultimately, this is
might be a cool new tool, but we shouldn’t use
it unless it actually, if we bring it into a
clinic, if it actually improves how we take
care of patients, that it actually improves care. And so I believe that
you have to combine the artificial intelligence
with a human intelligence, because any diagnosis
and downstream treatment has large implications
for patients. And so we still have a
lot of future work ahead that may need to be actually
tested in the clinic. Medicine changes over time. How often should we
be reassessing it? So I just took my board
exam, which we have to take every couple of years. I get reassessed. I think the machines
need to be reassessed. And in fact, the algorithm
that we developed 10 years ago with the early
studies was Zak, we are reassessing it now
to see how well it runs. It was built on historic data. Now we have a new EMR. We’ve got new treatments. How well is it working? And then, ultimately,
the responsibility is with the human
clinical care team, and that in this
rapidly changing world, that team needs to
understand how this AI came to that decision or
that results and how to integrate it into the care. So with that, I’d like to
thank you for your time. [APPLAUSE] Thank you very much for
those very good [INAUDIBLE] I’ll grab a water, actually. Think I left my water here. I’ll grab this one. So this is I think the more
interesting part of the session today, which is where
we get questions from the audience,
which you have been kind enough to forward to me. If you start getting
bored of the questions that we’ve selected, I
will entertain hands up. First of all, let’s go get with
the most important comment. This is not mine. Fashion Police says,
nice shoes Gina. [LAUGHTER] But that same comment– but the same card has an
easy-to-answer question. It says, essentially, are there
MRI data sets linked to cancer, linked to genetics, so we could
do machine learning on those? And it’s an easy answer
because in fact there is a data set available
courtesy of your tax dollars. The National Cancer
Institute has something called the Cancer
Genome Anatomy Project, where you have MRI images, CT scan
images, and pathology images, and the genomics both of the
individuals and their tumors and a variety of
other measurements. So you can– there are
whole fields of research that could be done with that. I’m going to now start
picking on my colleagues here. So there is a
question about– we got Brett, which is since
one way of looking at blacks is a skin color, so why not
factor that out of analysis, and wouldn’t we be better off? Yeah. I think the first
example is going to be similar to the
example where Amazon tried to do that with gender. But the other
thing is skin color is not necessarily indicative
of [INAUDIBLE] genetics, but its highly
correlated with them. And so it can be
a useful feature. It can be helpful in
actually diagnosing a disease or picking a treatment without– especially when you don’t
have the genetic test done. The other side, I
think, is that it can be a marker in certain
areas for socioeconomic status and other markers, where
we see the differences between insurance
and other things like that that do play
a key role in outcomes. Thank you. So we received a question
via YouTube from Ireland, and also a few
questions that are very much like this one
from local audience members. And they basically
are asking, are we going to be put out of a job,
the diagnostic radiologists, the pathologists, and
the ophthalmologists and the dermatologists? And so let me tell you a
little story first of all. So I saw a AI program
that was published with a study describing
it, published in the Journal of the
American Medical Association. I called up my cousin,
my first cousin, who’s a very proud ophthalmologist. I said, ha! Look, we’re going to replace– what do you think
of this program that can look at a picture
of an eye and just diagnose in a few microseconds
whether you have retinopathy or not? And he said, fantastic. This is actually great. I hate looking at those images. I’d much rather be in the
operating room doing surgery and have an AI program do that. Meanwhile, I’m
seeing more patients, I’m getting more money,
and I’m having more fun. So that’s one version of it. I’d say the big picture
is if the doctor is not seeing the patient at all,
it becomes much easier to replace them. So you may or may not know
that if you get an X-ray done in several hospitals
in the United States, those x-rays get
interpreted while we’re sleeping during the daytime
in India and Australia by doctors who
are very competent but have never seen us. That kind of expertise
can be completely replaced by computing. And so from my
perspective, and I think it’s a growing
understanding, is we value the human contact,
not just for the warm and fuzzy part, but because of what
Dr. Liao was talking about, which is we know how to
weigh not only the diagnosis but what are the things
they’re going to tolerate? What are the things that
you might want to balance? By the way, Kat, I’m just
impressed the conversations that you have with
the cab drivers. They never– [LAUGHTER] –never want to
talk to me about– I get the chatty cabbies. Yeah. So the short answer is for those
doctors who don’t see patients at all, they’re at a much
higher risk of being replaced. Doctors that see patients
have a lot of value that will cause them to be
sought after for sometime to come. So there’s been
several questions that I’ll direct to you,
Kat, about essentially looking at these programs as
if they were diagnostic tests. They talk about things
like false positives and false negatives. How do we think of
these programs in terms of how well they perform? You already hinted at
this issue by saying you want to update
the algorithm that we did for rheumatoid arthritis. But this is a very
interesting question because the Food and Drug
Administration, the FDA, has just approved
two AI programs, one for the retina and the
other one was for chest x-rays. Yeah. And it’s already approved. So the question is, will
it continue to be updated? And the question that I’m
having for you is how do you think about how to evaluate
what are the performance metrics? Yeah. I think, at least to
start, we should probably evaluate them similarly to how
we evaluate current humans. And that is with– and it might not be
exactly the same, but reassessment of
these models over time, and making sure–
adding new inputs to see if– against
gold standards, meeting with humans as the
gold standard– to see they continue to meet those
benchmarks as a start. That’s a start. And medicine’s going to
change, so just retesting it on medicine, on real data,
I think will be part of it. Yeah. I know that Kat talked
about the diagnostic codes being a part of that. The diagnostic codes
completely changed in 2015. So that’s the type
of example which will break a current algorithm
that will require retraining. Many of the, I think,
harder ones to catch are going to be much
quieter than that. That was an easy one because
it’s something that everybody sees coming and can adapt to. So there’s an
interesting question from Gainesville, Florida. They say, can AI be used
to train doctors, nurses, and other health care workers? And here’s an interesting thing. Because of privacy concerns or
appropriate privacy concerns, we can’t share a lot of data. But I don’t know if you’ve
seen on the internet these things whereby I can say,
I want to see Kat as a blonde or I want to see her
in a different dress, or I can see her rendered
in a certain painter style. And so these deep learning
outcomes can not only recognize, they can
generate images. So for example, we can generate
millions of broken bone images. We can generate
millions of skin lesions that are actually not
anybody’s skin lesions but look exactly like it. So we can provide a lot
more training materials that previously have been very,
very limited because of privacy concerns, and frankly
because some people view them as their intellectual property. So Kat? Yes? This is an interesting
question, and there are several questions
on this theme, which the theme is privacy threats. Large data sets. Who’s watching them? For what purpose? This is one version of it. Is there any fear that
the patient info, data, is shared over the internet,
can be hacked into and shared with the wrong people
or misused by others, like insurance companies? Well, I think that’s
always a fear. And so at our
institutions, they really take this very seriously. And so their data’s
behind firewalls. They’re locked in
these server farms. So this is taken very seriously. They do the best they can. And for research
purposes and for– so research is one level. There is a whole set of
rules of how we don’t even use the actual patient numbers
when we’re doing the analysis. We can’t send out data with
not even dates on there because that might help
to identify patients. And so for clinical care,
there’s another level to that. So I think the
health providers are doing as much as we can to
prevent that from happening. One thing I’ll add
to that is this is not a new
threat, necessarily. There’s actually a
federal government site that tracks breaches
of over 500 patients. And if you look at it,
you’ll see, shockingly, that more than 50% of these
are hard copy breaches. And it’s people having left
hard copy of patient records or other things in places
that they shouldn’t, or just losing them. It’s not always something
where it’s been clearly taken by somebody else. But I would raise that
this is not a new thing, but I think it’s an
incredibly important thing. Yeah. The fact is, if you walk
into most hospitals wearing a white coat and look like
you know what you’re doing, you could walk out
with a lot of data. [LAUGHTER] And that’s just a reality. But also, I want
to point out this is something that you should
think about as citizens. Studies were done and public
was asked, who are you worried about seeing your data? So unsurprisingly, they said, I
don’t want commercial companies to see my data. And they also said,
and this surprised me, I don’t want public
health to see my data. But I want researchers
to see my data. But the irony is the
commercial companies have contractual
rights to your data. Public health authorities have
a legal right to see your data. The only group that have major
blocks to seeing your data is the researchers. And so it’s like
we’re, on the one hand, putting this huge dam to
prevent data leakage, where on the side, it’s just flowing
out to these other parties that we may not want
it to be slid into. By the way, I actually–
here’s an interesting factoid for my audience. Indigo was the principle blue
dye for hundreds of years. I did not know that. One of major crops
in British India. Great Duke Ellington song,
had a song, “Mood Indigo.” Share that for
cultural edification. [LAUGHTER] This is a question for you, Kat. And I think it’s raised and
it’s legit by the taxi story. You see, stories are important. Is it better to go with the
highest course of treatment in order for a much
better outcome based on the patient diagnosis
when in the gray area? Yes. Yeah. And I actually had
this conversation with the taxi driver. I said, I think
that one hospital might have wanted to go
err on the side of caution. But chemotherapy is not– every treatment comes
with a side effect. So he could have
neuropathy, which losing the sensation of
his toes and fingers. So yes, in general
we do think that way, but it isn’t always the
case because– especially when you’re working
with very toxic drugs. And for this
particular cab driver, he was afraid that 12 weeks
of chemotherapy and radiation would mean that
he loses his job. And if six weeks was
enough and he kept his job, then that is a really
big difference for him. Thank you. This is also a good question. Why did you get into this area
of artificial intelligence meets medicine? So you, Kat,
started in medicine. You started in artificial
intelligence, computer science. So you both answer it,
and then I’ll answer it. Why don’t you start? Yeah, I can start. So I was actually
working in consulting for financial services doing
something that actually felt pretty meaningful, and
it was assessing damages from the mortgage crisis
in 2008 and trying to figure out who was
wrongly foreclosed on and which individuals were
harmed so that the banks could be made to pay some form
of restitution to them. After about a year of doing
this, everything got settled. Everything ended. Everybody got even payouts. People who were more
wronged than others got no more than
the average person. And it all felt
kind of worthless. And so in thinking about
where I wanted to be and where I wanted to try
to be applying skills, medicine was the
natural next route. And what was your
first hook into that? I did study computational
biology as an undergrad. I had initially
thought that I was– my parents are
probably disappointed I didn’t go the MD route, and
fortunately, my younger brother did, so they’re content. [LAUGHTER] I was actually watching
a neurosurgery, and ended up getting kicked
out of the operating room because I thought I
was going to pass out. [LAUGHTER] And this is at the
age of 18 where I was a testosterone filled
young man who wouldn’t leave on his own without the
neurosurgeon actually asking me to leave
because he didn’t want to operate on me next. [LAUGHTER] Love it. Kat, you’ve had time to
think about this answer. Well, maybe not as exciting. So I was trained as a clinician. I thought I would
mainly see patients. Got into research. And then, a lot of my–
as a clinical researcher, my questions come
from the clinic. And I realized that
there were some questions I couldn’t answer. So I’m a rheumatologists. I study a lot of autoimmunity. And I said, we need to
look at bigger data sets, and we need to know
a lot of diagnoses and really look at really
complex relationships. And we just couldn’t do it at
the time I was coming through. So when I heard about Zak’s
project and the scope of it and the amount of data,
working millions of patients, I got really excited
and jumped on board. And now you’re a
great leader in it. So I’ll answer for me. I had no one in my
family who was a doctor, so I didn’t know what
medicine was about, and I didn’t have
any mentor telling me how to figure that out. So I just applied
to medical school, got into medical school. And then I realized after
the first year, wow, this is a very noble profession. It’s a profession. It’s a trade. But it’s not really a
science, and I thought I was going into science. So then I panicked,
and I dropped out the ambitious way, which
I dropped out and got my PhD in computer science. And then I went
back to medicine, and I’ve completed my training
in pediatric endocrinology. And all the while,
I started seeing all the holes in medicine,
all the mistakes that are being made, all the
slowness that’s happening, all the things that
make Netflix look better than medicine in terms of
recommending the next step. And frankly, it made me enraged. And so I channeled that rage
into grant writing, which is something that I’ve
become quite good at, and started research groups and
research in this arrow, which allowed me to work with smart
young people like the two you just heard. All right. So let’s– ah. There was a question,
a reasonable question. Hey, can I get
that iPhone program that allows me to recognize
melanoma lesions on my skin or other people’s skin? And the short
answer is this thing really works, it
was really deployed, and speaks to another question
that we got from the audience. Anybody want to guess why
it’s not yet available? [INAUDIBLE] What? They don’t know
how to [INAUDIBLE] [LAUGHTER] Getting close. Unfortunately, cynicism might
be the order of the day. It’s who is going to be
medically legally liable when this thing makes a mistake. You need a company behind this. And some random
Stanford researcher is not going to say, hey, use
this, and if it works for you, send me a car. Because the cab driver
is going to say, hey, you made me do
this therapy because– and it turned out I
didn’t have melanoma. And so you really have to
have, A, a company that takes on medical
legal liability, that educates physicians about it,
and that gets FDA approval. Big, big challenges. And those challenges are as big
as the scientific challenge, perhaps bigger than the
scientific challenge of getting the software distributed. One quick question. Will AI be able to
detect pancreatic cancer? Any of you want
to deal with that? Come on. Punting to you, Zak. [INAUDIBLE] So my answer is I
don’t believe we’re measuring the
things that we would need to measure in order
to be able to diagnose pancreatic cancer. Right now, we tend
to measure things that are associated
with pancreatic cancer very, very late, like
right up at diagnosis or after diagnosis. I could imagine a
future where, if you’re genetically prone to
have pancreatic cancer, we’ll measure a bunch of
things like circulating cells. But this is not an AI question,
it’s a measurement question, in my opinion. So I think we’ve really
answered all the questions, and there’s nothing wrong
with ending before time. Is there– [INAUDIBLE] Any other– I will entertain– yes, a question from– What if it turns out that
Boeing designed the software? Well, that’s a very good point. So I actually shared a
very sad story from– who– Ralph Nader. So Ralph Nader’s grandniece
was on one of the flights that crashed with a 737 MAX. And what really happened
will be determined, but we know some
things that were true, which is the designers put a lot
of faith in automated controls and made it very
hard for the pilots to go with their intuition. So on the one hand, yes, pilots
get drunk, they fall asleep. And doctors make mistakes,
and doctors fall sleep and they get drunk. And so you create
software to avoid that. But what you’re also
doing is making it harder for doctors and pilots
to use their intuition. And so if you’re a great
doctor and a good doctor and an alert doctor,
you may not be enabled. You may be prevented from doing
the right thing because there’s something very confusing
going on that’s not intuitive. And so the plane was
actually actively fighting the responses because
a program had been imposed. And Ralph Nader’s
comment is his grandniece died because of some
hubristic assumption that the computer was
always going to be right. And I think it is a
good cautionary tale. And I think it is
a reason why it may be that
computers and AI will be used to watch
for errors, will be used to make
automated diagnoses, but I in my own
care of my family and myself will
always hope that there is a smart, intuitive,
commonsensical doctor who’s at the helm, and that she’s
making sure that something obvious and stupid– Because AI programs
can be very, very good at what they’re
doing, but they’re not intelligent in the
sense of human beings. And so for example,
one of my students just published a
paper in Science where you take an image
of a retina or of a mole, and you just add a
little noise to it. And to you and me, it looks
like the same picture, so you can still make the same
diagnosis as you would before. But the person who
added the noise knows something about
the computer program, so that little bit of noise
completely confuses the program and it completely changed
this diagnosis from melanoma to not melanoma or vice versa. The point is,
these programs look like they think like us,
they don’t think like us, they certainly don’t
have common sense. So just because someone can–
a machine that can play chess at the grandmaster level
is still not going to be the machine that can
tell you reliably– do you want this
treatment that’s at a higher risk long term
but is more likely to get you to your daughter’s wedding,
or this other treatment which is higher risk
for the short term, but overall a better
chance of survival? That’s a human kind
of judgment question that maybe one day, in
a science fiction sense, computers will be able to
do, but we’re far from that. Right now, we’re
in this amazing era where things that human beings
don’t do well, like look at images and see that little
spot that maybe was missed on a mammogram that might
be associated cancer, looking at pathology
imaged and making sure that you don’t miss any
of the cancer cells. It’s very good at
that kind of detailed work in a very high throughput,
systematic, reliable way. Because again, remember what
I said at the beginning. Pathologists are not– will
disagree with one another on a same sample
maybe 30% of the time, but when it comes
to decision making, you’re really on target
to bring up the 737. We should not put
ourselves in the position where the computer program
is deciding on therapy. With that, thank you. [APPLAUSE]


  1. The people in third world countries think AI as something too far away, but certainly with data science backed with AI, it is set to change the way medicine is practiced throughout the world

  2. Eventually, Artificial Intelligence can diagnose and cure diseases a lot more efficiently than human doctors. Diseases including cancer. Imagine the devastation the cure for cancer will bring to the for profit disease research, medical, and health care industry in terms of the many people who will lose their jobs and the demise of the for profit medical health care industry

  3. Id rather have an artificial intelligence perform surgery than a stressed out, over worked, sleep deprived surgeon who may mess up.

Leave a Reply

Your email address will not be published. Required fields are marked *