By Lance Eliot, the AI Trends Insider
One of the most vital tenets of science consists of the principle of reproducibility.
When a scientific result is reported, how do we know that it is something generalizable and not just a happenstance, one-time fluke or possibly even merely the outcome of a mistake made in performing the science? We can have a greater belief in the result if it is possible to reproduce the science. Reproducibility consists of several factors, including perhaps foremost that the effort when replicated will independently achieve either the same results or similar to.
If I claim to have invented a perpetual motion machine, and I demonstrate it in my lab, but doing so only for myself and perhaps some chosen lab assistants, I would want to tell the world of this amazing breakthrough. The rest of the world might be very elated to know that finally a perpetual motion machine has been created. Just think of how it could change the world. Others that have tried to devise a perpetual motion machine are excited too, but also a bit wary since they were unable to achieve the same grandiose outcome on their own. Naturally, they would want to know how this perpetual motion machine works, and they would want to try and reproduce it in their own labs.
I might then reproduce the effort in my own lab. I then come back to the world and say that yes, it really does work. Voila! But, should you believe that it really does work? Notice that the reproducibility was not independently carried out. Instead, the same researcher or developer merely claimed that they were able to reproduce the result. It could be very sincere and the researcher or developer is genuine in their belief that they reproduced the results. What the rest of the world doesn’t yet know is whether it is really bona fide, or maybe an unfortunate mistake made repeatedly, or maybe an attempt at tricking the world into believing something that just isn’t true.
Cold Facts About Cold Fusion
So that you don’t think that this is an absurd notion of something like this happening, I’d like to remind you of the now classic case of desktop cold fusion.
For those of you that were around in the late 1980s or have studied the history of science, you might know that in 1989 there were two chemists at the University of Utah that claimed they had been able to generate cold fusion. The chemists, Stanley Pons and Martin Fleischmann, said that they had produced excess heat that had to be as a result of a nuclear process, a miniature nuclear reaction as it were. The equipment used was relatively inexpensive, easy to assemble, and pretty much what you could do in your own backyard (with a bit of effort, albeit). It became one of the biggest news stories of the year and was heralded as an incredible breakthrough.
Did anyone try to reproduce cold fusion? You betcha.
Scientists all around the world scampered to see if they could also achieve cold fusion. Some did so to help prove that the Utah scientists were right and that the discovery was valid. Others did so to disprove the Utah claims and to ensure that the world would not be falsely misled. Some did so just out of curiosity as to how it works and what else it might lead us to.
Unfortunately, much of the details about how the Utah scientists achieved their result was kept somewhat secretive and thus it made it difficult to try and reproduce. The number of reproducing efforts that said it could not be reproduced grew quickly, while the few that said they had been able to reproduce it were later chagrined to say they had to withdraw their results after further assessment. There were also later on the discovery of various sources of error in the original work, which tarnished it even more so. A real kill shot occurred when it was indicated that there weren’t any detected nuclear reaction byproducts as a result of the alleged cold fusion (the byproducts should have appeared if the cold fusion worked as claimed).
Bottom-line: No one has yet to truly reproduce it per se, and so it now sits on the junk heap of what some consider unproven, unsound, infamous scientific work.
Maybe this cold fusion is an aberration and we shouldn’t focus on one isolated incident involving scientific reproducibility. In that case, you might want to consider reading an article in the notable journal Science in 2015 that provided a research study done by independent scientists that were trying to reproduce the efforts of 100 of the most prominent studies in psychology. According to their independent work, they could only successfully reproduce about 39% of the studies. There have been various similar studies that have tried to reproduce other well-known and well-accepted scientific studies and been unable to do so entirely.
Please don’t toss out the baby with the bath water on this topic. Anyone that reaches the conclusion that all science is fake news, well, you’ve got to revisit that kind of mentality since it doesn’t make any sense here. There is lots and lots of science that is fully bona fide. Some of it has been reproduced, some it has not been, and for the portion that’s not been reproduced you cannot conclude it is therefore invalid. You can only say that is hasn’t been reproduced.
This too can be misleading because often times the same kind of study is not specifically reproduced, and instead there are other studies that build upon the results of the original study. Thus, you could assert that those extensions beyond the original study should presumably not have been valid if they were based on an original invalid core. If the extension studies are valid and being reproduced, you could assert that the original core was valid.
The potential unpleasant aspect of this involves circumstances where further down the road it is somehow shown that the original work was invalid, which can then call into question all of the other work that followed upon it. Fruit borne of the poisonous tree, as they say. Knocking down a house of cards, some might suggest.
Reproducibility Is Not So Alluring
Why isn’t every science work being subjected to reproducibility?
That’s an easy answer – there’s not much incentive for doing reproducibility studies. As a scientist, you usually are measured by how much new science you create. Spending your time on simply reproducing someone else’s work is not going to get you much mileage. If you showcase that the original work is valid, you aren’t breaking new ground and instead merely adding to the belief that the original work was novel and sound. If you showcase that the original work was invalid, the odds are that you’ll immediately be under attack by the original researchers and others that believe in the soundness of the original effort. Until others jump on your bandwagon of the invalid nature of the work, you are likely to be an outcast in the science community.
Situations like cold fusion are unusual in that any kind of incredible breakthrough is going to immediately draw attention by powerful forces in support and in opposition. There’s nothing more disconcerting and motivating that if you’ve spent most of your scientific life trying to achieve a perpetual motion machine, and unsuccessful in doing so, and then someone claims they did so, you would put your entire energy into wanting to prove them wrong. How could they have done it, when you labored for forty years and could not do so? Their claims must be false. You’ll gain notoriety for having shown it to be false, and also settled those that might look at you and otherwise say that you wasted those forty years looking in the wrong place to find that breakthrough.
Again, furthermore, anyone with such an investment in an area of inquiry is also going to want to know whether the breakthrough works or not generally anyway, so I don’t want to overly emphasize the downside of things.
How does this reproducibility apply to AI self-driving cars?
At the Cybernetic Self-Driving Car Institute, we are urging the auto makers and tech firms to make available the AI self-driving car efforts they are undertaking, as best they can, and especially urge academic researchers and commercial research outfits to publish the details of their efforts, so that a form of reproducibility can be undertaken overall.
At this juncture, there is almost no reproducibility taking place in the AI self-driving car realm.
Work by most AI developers in self-driving cars is either considered proprietary and not revealed publicly, or those doing the work are so hectic that they have no time to “waste” on doing reproducibility and are instead trying to get these systems out the door, so to speak.
For the auto makers and tech firms, it is a tough call as to whether to showcase the innards of their AI self-driving car systems. Each of those firms are spending millions upon millions of dollars to develop AI self-driving car capabilities. Why should they just hand it over to anyone that wants it? The proprietary nature of what they are doing creates enormous value for their firm and they rightfully should be able to seek payoff of the tremendous investments they are making.
I say this because there are some AI developers that are decrying the secretive nature of these firms. But, who can blame these firms that have made these huge investments? There is a race to see who gets to the moon first, so to speak, and these private entities are betting the farm on getting there. It makes no logical sense for them to simply handout their secret sauce. Intellectual Property (IP) in the AI self-driving car field is king.
On the other hand, we have the argument that if each of them develops their respective AI self-driving car capabilities in isolation of each other, and if there isn’t some means for others to independently verify what they are doing, how can we be satisfied that what these self-driving cars will do on our public roadways is safe?
There are some that argue that for these AI self-driving cars to be allowed onto the public roadways the auto makers and tech firms should be forced to open the kimono. If they want our roads to test on, they need to share what they’ve got. Otherwise, got off our roads. The counter-argument is that we might not ever see true self-driving cars if the auto makers and tech firms need to use only private test areas or even government sponsored proving grounds. The amount of self-driving car experience that a self-driving car can gain on a providing ground is considered a tiny fraction of what it can gain as experience in the real-world of everyday driving.
It’s the classic madcap desire for new innovation that is being weighed against the costs of getting there.
Academic Research Reproducibility
Even the academic AI researchers are being accused of some less than forthcoming reveals about their work. For many of the studies on Machine Learning (ML) and the use of neural networks, there are often claims of incredible breakthroughs on being able to do vision recognition or speech recognition or whatever, and yet the actual neural network is not made available for anyone else to try and independently verify. Are we to take at face value whatever is reported by the researchers?
Sometimes, the researchers will point to the fact that their work was published in a peer review journal. These researchers then assert that by doing so it shows that their work must be valid. Not so fast. For most peer reviews, the peer is not trying to actually reproduce the results of the study they are reviewing. Instead, the peer reviewers are supposed to examine the study and try to ascertain whether it seems sound and valid as based on whatever happens to be presented by the researchers. Peer reviewers that are experts in the field have a body of knowledge to help judge whether the nature of the study and the outcome seem to be plausible. But, they are almost never doing an actual reproducibility prior to agreeing whether to accept the research for publication or not.
There are studies that suggest peer reviews aren’t necessarily as rigorous as one might assume or hope for. There are often inherent biases by peer reviewers. Suppose the prevailing wisdom is that the world is flat. Suppose you are an in expert in the flat earth field. You get a research study that further supports that the world is flat. The odds of you rejecting it are low. Suppose you get a study that says the earth is round. You might reject it due to the assertion that the author is obviously not aware of the acceptance that the earth is flat. And so it goes. I am not saying all peer reviewers should be painted with this same brush, and I’d like to emphasize that many peer reviewers do an amazing job and try to remain impartial as they do so.
Studies of the body of “peer reviews” also have tended to suggest that often times the peer reviewer is either not well versed in analyzing the statistics found in scientific studies, or does not take the time to assess the statistics used. This means that a scientific study might get a free pass on how they did the statistics, which also could mean that the results are not statistically significant and thus could be considered questionable or even possibly invalid. Indeed, there is a program called Statcheck, produced out of Tilburg University, which purports to analyze the statistics used in science studies and then be able to find potential errors in how the statistics were used. This has brought controversy, including that sometimes the results of the assessment have been posted online and yet not allowed the original authors to rebut what is being posted.
For AI self-driving cars, there have been some initial efforts to go the open source route with the AI system component, including posting the source code and the neural network models being used. Generally, these are not the full effort and often more of a scaled-down version of what is taking place. Again, you can hardly blame the auto makers and tech firms from not wanting to go the open source route for their AI self-driving cars. See my piece on the caveat of open source for AI self-driving cars for further details: https://aitrends.com/selfdrivingcars/caveats-open-source-self-driving-cars/
Analogy to Pharmaceuticals
If a pharmaceutical company wants to bring a new life-saving drug to the marketplace, they need to undergo a rigorous, costly, time consuming, and secrets-revealing process. Some say that we should treat AI self-driving cars in the same manner. You can certainly see the comparison in the sense that a new drug can mean life-or-death for people. Likewise, an AI self-driving car can mean life-or-death for people. Would you be willing to take a drug that was untested and had not gone through a rigorous process?
Of course, the drug safety question is perhaps even less so than the AI self-driving car safety question. If you as an individual take an unproven drug and it kills you, that’s one life lost. If an AI self-driving car is unsafe, it can kill the human occupants, human pedestrians, and humans in other cars. If a drug is distributed into society and we discover its bad, generally it can be pulled from shelves and word will be spread to avoid taking the drug. If a self-driving car is unsafe, you can’t so easily pull it from use, since it means that people that are depending upon it for mobility are now somewhat immobilized.
The counter argument by AI developers and auto firms is that via OTA (Over The Air) updates, you can readily make an unsafe AI self-driving car into presumably a safe one. This use of an electronic beamed set of changes is unlike what could be done about a bad drug. There’s no means to do anything about a bad drug other than get rid of it and issue a new drug. An unsafe AI self-driving car can be remotely transformed into a safe one, presumably.
This strident belief of the OTA as a solver of unsafe issues for AI self-driving car is not quite the savior it might seem to be. See my piece on OTA for further info: https://aitrends.com/selfdrivingcars/air-ota-updating-ai-self-driving-cars/
The aspect that a particular AI self-driving car is unsafe needs to be first discovered, and it could be that there is a lurking bug in the AI of self-driving cars that causes those cars to generate numerous deaths before it can be figured out what the problem is. Even once the problem is figured out, a bug fix needs to be crafted in order to do the OTA with it. The AI self-driving cars need to receive and enable the OTA fix. It could be that the fix causes other unsafe conditions to arise. Suppose that there are many such bugs, and there are only being discovered one at a time, and thus this unsafe AI self-driving car is repeatedly undergoing this cycle of killing people, then an effort to find the bug, an effort to fix it, an effort to possibly fix the fix, and so on. For an unsafe AI self-driving car, this can just keep repeating over and over again.
And, I’ve already repeatedly called for more transparency in the AI self-driving car field – see my piece on transparency: https://aitrends.com/selfdrivingcars/algorithmic-transparency-self-driving-cars-call-action/
How could this transparency work? There are mechanisms being formulated to allow for collecting together a body of code and its artifacts, and then making them available for others to try using. For example, the Jupyter Notebook is a promising open-source web application for binding together code, visualizations, text, models, machine learning, and other artifacts. This provides a means to embody the work. Places such as GitHub are great as a venue for placing such work.
Some say that the AI self-driving car field should be treated like the airline industry and there should be various rules and regulations about safety in the same manner as are tracked for airplanes. For example, the Flight Data eXchange (FDX) is a special de-identified aggregated collection of data that captures safety related events about airplanes. It contains data from over 100 airports and allows for doing analysis of safety aspects. The AI self-driving car field is still in its infancy and not anywhere near the maturity of the airline industry, but it would seem instructive to learn from the airlines and try to proactively get ourselves ready for safety aspects. Maybe we need to have a AI Self-Driving Car Data Exchange (AISDCX).
You might say that the states that are allowing AI self-driving cars are requiring the auto makers and tech firms to report safety related data, thus, we’ve already solved this aspect. Not quite. Overall, the nature of the reporting has been considered relatively insufficient and not at all like the FDX kind of approach. Some say that if we burden the auto maker and tech firms with overly onerous reporting and regulatory requirements, it will kill the goose that laid the golden egg. Other says it is something that needs to come with the turf.
See my piece on AI self-driving car safety reporting tradeoffs: https://aitrends.com/business-applications/disingenuous-disengagements-reporting-ai-self-driving-cars/
See my piece about the status of regulations on AI self-driving cars: https://aitrends.com/selfdrivingcars/assessing-federal-regulations-self-driving-cars-house-bill-passed/
In the academic realm of AI self-driving cars research, we are gradually seeing more acceptance of the pre-publishing or pre-print approach to posting research prior to it being subjected to the rigors (and possibly delays) of peer reviews. Some applaud this as a means to try and get new science into the hands of the community as quickly as possible. This is especially important in an area like AI self-driving cars wherein advances in machine learning are moving very quickly and it would handy to leverage the work of others right away.
We are also seeing more of a call for computer science researchers to post their code and models, rather than just referring to their work in their write-ups. Doing so will aid the reproducibility factor. It will also aid those that are trying to stand taller on the shoulders of others, and aid those neophytes coming up the ranks that want to learn from others.
Irreproducibility is an important limiter in the progress of science and engineering. For those of you that are new to the subject, you might find of interest the rather large body of research about the value of reproducibility as essential to innovation (easily Googled); while for those of you that have been steeped in the reproducibility topic for years, there are a smattering of rather hilarious looks at irreproducibility, including the long standing Journal of Irreproducible Results (JIR) which publishes made-up scientific studies (it’s funny stuff!), and also some of the even more staid and bona fide journals will occasionally publish intentional irreproducible stories as an April Fool’s prank.
Though I enjoy a good laugh from time-to-time, irreproducibility is a real problem facing the AI self-driving car field and we’re going to have to tackle it, one way or another.
Copyright 2018 Dr. Lance Eliot
This content is originally posted on AI Trends.
You must be logged in to post a comment.