03/19/2012

From Information to knowledge: Also in Science

There was a time when access to information was the main obstacle to knowledge. The books were kept in remote monasteries where those few who could read them spent most of their time copying old manuscripts instead of creating new original knowledge. Today the problem is just the opposite. Scholars have such a pressure to publish their results that we suffer from a flood of information. The number of scientific papers, books and e-content is growing so rapidly that they are becoming noise, making really difficult to see any signal of new original knowledge.

The theme of information explosion is a recurrent topic that emphasizes on the role of internet in making information so widely, rapidly and expensively available that extracting useful knowledge is almost impossible. It is the quality not the quantity of the information what matters from news to entertainment, from travel guides to restaurant recommendations.

In the case of Science the story is different although the bottom line is the same. Scientific papers are not widely available, on the contrary there are guarded in the "virtual libraries" of the publishing houses that grant access only to those who pay (and this is not cheap) to get into the santa santorum of scientific information, the scholar journals.

Interestingly enough, those who pay to get access to these papers are usually the same one who wrote them for free. And even that research that has been paid by tax-payer money and done by scientists with contracts or fellowships from public sources are in private hands. Open scientific literature is one attempt to change this dynamic, but their impact factors are still low, what makes unlikely that the best research will be published in the open literature. Publishing houses are an important part of the scientific information selection, managing, and archiving, and they should make enough money to maintain high quality standards. A plausible alternative to find a compromise between profitability and accessibility would be to make papers freely available after a few years of being published. Some special issues or invited article available from freely available as a way to attract new readers. By making "old" information available more people would visit these websites; people who may pay for selected recent papers, reaching out to a new market sector, those who would pay only for selected recent papers while still keeping the institutional clients subscribed.

Universal access to scientific information is only one side of the problem. The main one, in my opinion, is the explosion in new information, motivated, at least in part, by the pressure that scholars have to publish their results to get funding, a promotion, or the recognition of their peers. Paper engineering, this is the art of getting a paper published in the best possible journal by presenting the information in a way that seems (but is not) new, valuable and interesting is one of the main problems of Science these days. Although plagiarism is a more serious matter, paper engineering is far more common and contributes more to the scientific noise that slow down the advancement of Science.

What are the solutions to both reduce noise and extract valuable knowledge from the massive amount of scientific information available nowadays? Less is more. Fewer higher quality papers is the only way to increase the signal to noise ratio. The indicators used to fund new projects or to promote a young professor should all be based on quality and impact. A large quantity of poor quality papers should be penalized instead of neutral or positive, as it is today. Some normalized indicators, which value quality over quantity, are useful in this direction. On how to extract valuable knowledge, information visualization and smart search is the new challenge for companies that in the past focused on making information accessible, organized and simple. Google Scholar, Google Patents and Google Books are all valuable efforts in this direction, but companies that are used to manage, organize and extract information efficiently and rapidly could be doing much in extracting knowledge from information.

Let's use an example. A group of recently graduated students decided to commercialize the technology they developed during their Ph.D. on a new solar panel. They want to know what are their competing technologies, their current yields, and main barriers to market. They will need to read hundreds of papers to get an approximate idea if they have something competitive. Part of the problem is the noise created by hundreds papers that add very little, but also that we do not tools that allow us for extracting simple and practical knowledge from the vast information available in this or, in fact, any other topic. Quick, useful and inexpensive access to scientific information is critical for many high-tech start-ups trying to solve all the technical problems associated to the commercialization of their technology. Many of the answers are already published but buried deep in the scientific literature.

Current software allow for searching by title, author, content or affiliation. Even the use of keywords is not that satisfactory. All these programs and websites will present you a list of publications for you to read but no answers, no visual information, no practical knowledge. Extracting knowledge from scientific information is a major challenge but also an urgent need if we want to navigate from information to knowledge, also in Science.

