Developing a Natural Language Processing (NLP) algorithm for research services is fascinating. It asks the AI engineer to merge many perspectives to deliver a complete tool. Every second a groundbreaking NLP algorithm hits the market, and researchers developing them need to be aware of the limitations and the conditions under which their NLP was developed. The tables are turned when researchers in various fields use a research-service tool based on NLPs. Therefore, there are aspects of machine learning (ML) a researcher must know to present a tool which supports researchers and research offices. Imagine this task as a crystal, only you don’t know how many surfaces it has: an N-dimensional diamond. Every time you turn the crystal and look at it, you discover a new perspective on how it can be seen, used and its failures in certain cases.
In this blog, I will share my experiences and opinions after several months in my work developing an NLP tool for scientifyRESEARCH, a research funding database. I will try to outline here the major aspects that scientifyRESEARCH is complying with while arriving at its novel AI-driven software for research grants.
Complete overview of research funding data
Data is one of the things on this Earth that is almost infinite. So, it is a complex task to provide a complete summary of a class of data. The task here, therefore, needs to be reworded. Instead of providing a complete view, we need to provide the most accurate description of the data. This is because the datasets are also living – they change over time.
Specifically, at scientifyRESEARCH we focus on the grants that are advertised on the Internet every day. Many are not updated by the funders, and many have details that are hidden within difficult-to-read texts. The task is to straighten them up all in one place, following the same schema, and then provide the original source of the data with a complete overview for transparency.
Linking up the trends in the grant advertisements from previous years is another powerful tool for researchers that helps in forecasting and planning future research steps. So, the history of the grant ads is a living memory for the NLP algorithm at scientifyRESEARCH.
Predictions guided by the latest amendments and principles
Data is living – it metamorphosizes into new forms and current trends on any given date. Therefore, an NLP with this knowledge can adapt to such changes and deliver the best results and valuable insights.
Such changes are geographically unique and sometimes peculiar to certain research areas. So, a single NLP giving the best results for all research fields might not be possible. One needs to be attentive to fine-tune algorithms for specific niche areas.
This also helps the users of our database with fast searches: researchers can use this tool to search for research grants that align with their unique eligibility requirements. Therefore, even if a substantial effort goes into developing the algorithms, the result it gives could be satisfying enough to justify it. Of course, we need to be judicious with the time division such that when there is a time-hungry task, it needs to be considered how much improvement would it give.
The scientific approach for NLPs: vetting responses as a human
Every NLP algorithm comes with a bunch of limitations. Developers need to be aware and transparent about these. Also, in many cases, vetoing responses or outputs from an NLP-based software might be necessary – even if the process is automated, human intervention and monitoring are required.
This provides a non-bias response to the NLPs. So, e.g., a researcher from the Southern Hemisphere of the globe should not only see the grants that are around them but also equally the ones they are eligible for in the Northern Hemisphere. Also, the ethnicity background of the researchers should not determine which grants they are eligible for unless required by the funding agency (some grants are exclusively available for minority groups in research).
In the end, sufficient documentation for decision-making is required for every step of this workflow. Also, redundant predictions and statements for research funding need to be vetted by humans e.g., expired focus points of funding bodies on rapidly developing research arenas.
How can NLP algorithms help researchers?
Writing any research grant is a mammoth task. It takes a lot of time and emotional energy to formulate a well-written proposal based on our work thus far and imagination for future planning, backed by facts. Rarely, there is anything else that can supplement this process. An algorithm that will not only show the perfect research grants for an individual researcher, but also present alternative grants that one can apply for is very welcome to boost researchers’ and Institutions’ chances of performing well-funded research.
Missed a deadline? Not to worry. Based on insights of past data, the NLP would be able to predict when the next round of opening grants is and show the amount of funding one can apply for. The model would send the researcher alerts when it is time to polish their draft and prepare for the next round. Researchers will have a research assistant to help them through the grant writing marathon.
Research funding information is often buried under loads of paperwork or networking in the correct environments. For early career researchers, such information is so invaluable that a grant database which can provide this insight easily is a must-have. When networking is costly, fair competition seldom occurs. An NLP tool can easily work across boundaries to present equal opportunities and make a real difference in the research ecosystem globally.