Tornado (Torn, Analyzed & Dashboard Organized) is a tool for processing large volumes of documents that literally cut them into pieces, transforming unstructured data (word, pdf, email, pptx, excel, images) into structured or semi-structured data for further analysis. The tool extracts the basic elements of any document such as texts, images, tables and equations using a Deep Learning model. These elements are then processed and stored in specific formats (txt, xml, csv, jpeg, png), together with their respective metadata. Thus, it is possible to reconstruct the original document from these elements. With semi-structured data in hand, it is then possible to apply Artificial Intelligence algorithms and models for different applications. Examples: use the images to train a classification or grouping model, concatenate tables or text data from similar documents to generate dashboards, use the text of thousands of documents in natural language processing and access through a cognitive search engine, making questions or observing the frequency and similarity between documents.
Identified on December 31, 2019, the SARS-CoV-2 coronavirus is a new virus for science. Doctors, biologists and researchers from around the world are studying in order to understand how their transmission works and create a vaccine. The cure, however, is just one of several doubts that scholars still have about the virus. To support research on coronavirus (Covid-19), the Laboratory of Applied Computational Intelligence (ICA) provided a search engine for academic articles in English that are rich in information based on currently available scientific evidence. All of these articles are related to the study of the coronavirus, such as case reports, transmission routines, environmental factors and explorations of the treatment strategy. The materials come from academic databases like Elsevier. The search engine works in a very similar way to Google (the one that we use every day). In the default configuration, after the user types the subject or keyword of interest in the search field, the results are shown based on a relevance criterion. Articles can be found in the following ways:
based on one or more keywords;
based on one or multiple sentences;
The entire information retrieval process consists on identifying, in the set of articles (corpus), the document that attends to the user’s necessity . The person can choose to perform the search based on the semantic similarity between the terms of interest or the frequency in which these same terms appear in the complete document or in the abstract. First, the tool will show the most relevant results in English, but you can filter them by year of publication. By clicking on one of the displayed titles, you can view the full article. In this way, the user can perform a thorough examination of the articles on the displayed coronavirus, depending on the options selected. Additionally, we offer a question-answer system. The user wants to ask a question in natural language and gets the correct answer in the context of the document where it occurs. At the moment, the service only supports questions and answers in the English language.
Timeline: The 21st Century Standing Committee on Emerging Infectious Diseases and Health Threats of the National Academies of Sciences, Engineering, and Medicine (NASEM) and the World Health Organization (WHO) have identified nine scientific issues that are vital to face this international crisis. These questions include the study of virus transmission and incubation, risk factors for obtaining COVID-19, the origin of the virus and the appropriate medical practice for the treatment of this disease. The full list of challenge tasks is available on Kaggle website. It is difficult for people to manually review thousands of articles that summarize their findings. Recent advances in technology can be useful here. One of the most immediate and impactful applications of Artificial Intelligence (AI) is the ability to help scientists, academics and technologists find the right information in a sea of scientific articles by leveraging scientific research more quickly.
We have organized a timeline with articles that can answer these questions in order to collaborate with this initiative. We believe that sharing information is essential to boost our ability to respond to the coronavirus pandemic.