Russian-Taiwanese project number 17-57-52011 «Research and Development of Human-Computer Interaction System for Virtual Environment Based on Set of Screens and Depth Sensors». Project Manager Chuvilin K.V.

This project is aimed at the development of virtual environment technologies, methods and algorithms of visualization and human-computer interaction and their applications for solving a wide class of complex problems of basic science solved with the help of human-computer complexes. Previously, our team, with the support of the Russian Foundation for Basic Research, created a serial sample of the HE system, which was successfully applied to a number of projects in basic science, industry and education. In addition, according to the RFBR competition of scientific projects carried out by leading youth teams, a universal optical motion capture system was developed. The goal of this project is to study virtual environment technologies and create a complex based on them that implements a human-machine interface with the ability to carry out experimental measurements of the degree of perception of virtual space and accuracy of positioning and manipulations. Particular attention is paid to the ultra-high resolution imaging system and the natural interface based on the capture of hand motor skills. The relevance of the reengineering of virtual environment technologies is due to the new challenges of our time. The constantly growing complexity of physical phenomena studied in scientific and engineering disciplines requires the development of new approaches and powerful technology for processing and analyzing large amounts of complex data. Constantly growing complexity of the created machines and mechanisms, buildings and structures, transport systems and technological processes, also require new approaches in the design and creation of prototypes. The ever-increasing complexity of human-driven machines and mechanisms, objects and processes require new methods and tools for staff training. The current moment is characterized by a catastrophic growth of information that needs to be processed in order to maintain progress in the development of modern civilization. The volume of data is growing faster than the performance of computers, which, following Moore’s law, doubles every one and a half years. The development of visualization systems in resolution is several times lower than the performance of processors. To create a high-resolution screen, a group of projectors or monitors is used. A typical example of an ultra-high resolution composite screen is the installation of a virtual environment at Stony Brooks University, NY, USA, whose 4 walls contain 416 LCD monitors. At a resolution of 2560×1440 pixels on each monitor, the total resolution of the installation exceeds 1.5 Gigapixels and is the world record holder. Of the variety of virtual environment tools in this project, we focus on two main ones: (1) improving the natural interface for manipulating virtual objects; (2) improving the quality of displaying virtual scenes by increasing the resolution of display devices and taking into account gaze tracking. These tools are key to the virtual environment, judging by the definition: virtual environment technology is defined as “interactive real-time graphics with three-dimensional models, when combined specialized display technology that immerses the user into the model world, with direct manipulation of objects in the model space”. The original approach to the implementation of motion capture functions is to use a combination of groups of depth sensors, in the same mobile – placed directly on the operator. To increase the resolution of the imaging system, it is proposed to use a group of liquid crystal monitors combined into a single panoramic screen. Our proposed research methods in Russia and Taiwan have not previously been conducted. This project aims to develop completely new visualization methods in a virtual environment at a qualitatively new level of high resolution and incorporate dynamic scenes using a robust mathematical foundation, proven engineering solutions and the latest tools such as RGB-D cameras. Project participants from the Russian side: Chuvilin K.V. (Project Manager), Khlamov M.A., Pestrikov V.I., Afanasyev V.O. Project participants from the Taiwanese side: Timothy K. Shih (co-director of the project from the Taiwanese side), Hon-Hang Chang, Ankhtuya Ochirbat, Enkhtogtokht Togootogtokh, Kai-Jen Cheng, Yun-Wen Huang, Yu-Ting Cheng.

17-07-00762 «Research and development of a security system based on nature-like technologies, visual analytics and multidimensional data warehouses». Project Manager Kopylov G.I.
Situation awareness is important in technology for the prevention and elimination of natural and man-made emergencies. To this end, at this stage of the project, the concepts of subject and object are introduced, and the structure of the subject’s activity in the situation is described. In a simple model, the situation includes objects, subjects, and indicators. Objects are all active participants, things, informational elements that can influence conditions in a situation, exist independently, are of independent value, can participate in several situations. Subjects – objects, from the point of view of which we look at the situation. As a rule, conditions in a situation and the achievement of a certain result are important for him. Indicators are characteristics that describe the situation and the participation of objects in them, the condition of conditions. In a more complex model of situations at each considered moment of time, the subject is included in some set of interaction situations – some situational. environment in which his behavior is realized. That is, the subject is included in several parallel streams of situations. Currently, the human infrastructure is being virtualized. The considered approach to the presentation of the conditions of activity of objects in situations can be used as a “meta-level” to study human behavior in modern conditions, ensure the interoperability of created elements of infrastructure and technologies and can be used as one of the conceptual foundations in the process of designing and using new information technologies. The boundary between the real and the virtual in a situational environment is shifting towards an increase in the virtual component. This trend is accompanied by the formation of metasystems that support the new functionality of the virtual world – the interoperability of the properties of things, their communication, mechanisms for managing this infrastructure. We can say that the virtual component can include all types of objects in a situation. These are subjects whose part of their activities in everyday life and work activity is shifting to the virtual sphere: groups of people who are increasingly included in social networks, virtual settlements and other types of virtual activities; things that are getting more and more significant virtual component. At the same time, the implementation of axioms in virtualized situations has some peculiarities. For example, the conditions-axioms of space and time function there differently with respect to the real world. Participants in a virtualized situation often do not have information on where other objects are located and when they performed their activities in a situation. This leads to difficulties in the behavior of subjects and to new risks in their activities. Also at this stage of the project, visual analytics on nuclide composition modeling in thorium-based liquid-salt nuclear reactors was considered. This simulation system is designed to analyze the resulting nuclides in a thorium-based nuclear reactor and to ensure the safety of these reactors. Project participants: Kopylov G.I. (Project Manager), Reingold L.A.

17-07-00833 «Architecting and IAS Prototype to Demonstrate the Solutions of NPP APCS Based on Virtual Environment». Project Manager Zakharushkin V.F.
The installation and commissioning of complex electronic equipment for automated process control systems of nuclear power plants is a fairly complex process that requires highly skilled performers. For the NPP APCS, the presence of a large number of subsystems (more than 30), a large amount of heterogeneous equipment (sensors, switches, etc.) is typical. To carry out maintenance work, the NPP maintenance personnel should use the instructions for carrying out the relevant work. The volume of operational documentation reaches tens of thousands of sheets. As a result, even qualified repairmen have to spend considerable time searching for relevant documentation, getting acquainted with it, and monitoring the compliance of their actions with the technology specified in the documentation. Difficulties are not so much the search and selection of specific instructions for servicing the required equipment, but practical adherence to these instructions, that is, operational information support of personnel actions and control of compliance of their actions with the technological process specified by the manufacturer of this equipment (software). At present, the developers of the automated process control system of NPPs do not provide such facilities to service personnel. This problem becomes especially acute in the case of repair, replacement of equipment or maintenance work for NPP personnel located far from the manufacturer or service centers. This is especially true for NPPs abroad, in developing countries, because the presence or challenge of qualified technical specialists of the manufacturer is difficult or very expensive, and the qualifications of local specialists are not enough to quickly solve the problems. The use of virtual and augmented reality technology in the application to the creation and implementation of automated control systems for NPPs is a new and promising direction. As part of the 2 stages of work (January-December 2018), in accordance with the application, work was carried out on the implementation of models and models for creating a system for visualizing control information of the automated control system of NPPs (Augmented Reality-PP system). As objects for demonstrating the possibilities of application in automation systems using the technology of augmented reality, the following technological processes are proposed for servicing the equipment of the automated process control system of NPPs that are part of the equipment of the Typical Program-Technical Means (TPTS) manufactured by N.L. Duhov. The “Augmented Reality-PP” system structurally consists of two software applications for visualization devices: augmented reality glasses Epson BT-350 and an electronic tablet. Both types of devices are running Android OS version 5.1 or higher. The designed information system layout allows for the implementation of an improved process of visualization of equipment maintenance, training of repairmen, and control of technological processes. Project participants: Zakharushkin V.F. (Project Manager), Solomentsev Ya.K., Khlamov M.A.

17-07-01475 «Region Resilience to Cascading Failures across Interdependent Critical Infra-structure Lifeline Networks». Project Manager Kirillov I.A.
Understanding the nature and patterns of various types of interdependencies between the individual systems of the region’s critical infrastructure (power generation, telecommunications, gas, transportation, water, etc.) is necessary to identify and minimize risks, mitigate the consequences and effectively recover from a cascade spreading of functional failures and structural damage. This problem is of particular relevance for isolated (for one reason or another) regions that have limited capacity to restore, replace, and use the reserves and resources of neighboring regions. The goal of the project is the development, verification and validation of a quantitative method of risk-informed resilience assessment (measures of structural-functional and adaptive sustainability) of the engineering and technical infrastructure of a particular region to systemic accidents induced or caused by multiple hazardous effects of natural, man-made and anthropogenic character (combined or sequential). In the second phase of the project (2018), two analytical tools were developed for the probabilistic description, analysis, and quantification of the resilience of interdependent life-line networks. First, a general framework to risk-informed quantitative assessment of resilience was proposed, which allows one to explicitly take into account both the stochastic nature of the hypothetical accidents themselves and the uncertainties in management decisions and technical actions in the post-accident period. Secondly, a specific engineering method was proposed for probabilistic description of scenarios for their systemic identification, classification and use in preparing strategies for managing risks and the persistence of critical infrastructure, taking into account the stochastic nature of the dependencies between assets leading to cascade accidents. The need for practical application of the developed tools is illustrated by the example of integrated safety assurance of a transport hub for LNG in the Arctic region. Project participants: Kirillov I.A. (Project Manager), Berberova M.A., Meshcherin S.A., Panteleev V.A.

18-07-00225 «Research and development of the “Universal Dictionary of Images” to create a human-machine interface». Project Manager Rykov V.V.
The relevance of the project is evidenced by the measurement of the popularity of subjects on the Internet – a request to the Web on the project topic yields 18,900 results, and the novelty is determined by the specifics, for example, the query «Universal Dictionary of Images» gives the «No results found for» «Universal Dictionary of Images» and 1 result M24.RU – 10 unknown: monsters, stairs, funnels and keys … www.m24.ru/articles/112668/ … The proposed project involves the creation of a bank of various already existing images that are widely distributed and can become a means of interethnic communication for people who have no other channel for information exchange. Such images can include common gestures, signs of orientation on roads, in transport, on the street, in public places and institutions. They will include both individual images and their combinations that make up a single conceptual complex (how to behave at a table, in transport, at a stadium, etc.). It should be noted that the proposed dictionary is designed for interpersonal human communication. Computer recognition of images cannot be the main goal of the project in this case. Orientation to interpersonal communication enables us to choose out images for the dictionary and, not less important, to establish practically feasible tasks at each stage of its creation. For example, at the first stage we will establish the following boundaries: to select one hundred dictionary entries for several groups of images; provide them with vocabulary explanations in Russian, English and one more language of potential consumers, break them into the appropriate groups, etc. This will give us an idea of how to organize work experience, which will later become the habitual skills of creating vocabulary articles. This will also give us concrete parameters for the work in future — which performers and materials we will need, the time parameters for the second and subsequent stages, etc. It will be easy for us to get one hundred such and similar subjects, to present them in computer embodiment and to offer as a model of the future vocabulary that will be constantly replenished, both in terms of improving existing articles and creating new vocabulary units and even entire thematic complexes. Project participants: Rykov V.V. (Project Manager), Zolotarev O.V., Mestetsky L.M., Orlov D.E., Solomonick A.B., Tiras Kh.P., Khakimova A.Kh.

18-07-00909 «Research and development of methods and algorithms for assessing the cross-linguistic semantic similarity of texts for the analysis of their ideological influence». Project Manager Khakimova A.Kh.
The project is devoted to the solution of the fundamental scientific problem of semantic modeling, in which a methodology for evaluating the semantic similarity of any texts in different languages is developed. The study is based on the hypothesis that the proximity of vector representations of terms in semantic space can be interpreted as a semantic similarity in the cross-lingual environment. The index of semantic text similarity (ISTS) will be constructed taking into account the availability of terms and ideas with a close semantic representation. Each text will be associated with a vector in a single multilingual semantic vector space, the measure of the semantic similarity of texts will be determined by the measure of the proximity of the corresponding vectors. For the construction of vectors will be used Word2Vec, NASARI, using multilingual linguistic resources, such as WordNet, Wikipedia, BabelNet, etc. The linguistic processor developed by the team of authors will be used to identify automatically similar phrases and semantic equivalents, as well as the thematic analysis methods (LDA, LSA, ARTM), the method of constructing an associative domain portrait (APPO) based on statistical methods and distributive semantics. A technique is being developed to construct a dynamically replenished multilingual collection of documents published in the network using methods of distributive semantics (APPO). We propose a quantitative indicator that measures the degree of semantic similarity of multilingual texts (the Index of Semantic Textual Similarity) on the basis of identified interlanguage semantic implicit links. The calculation of the ISTS is made with the indicator introduced by the authors – a measure of the similarity of two any texts. The setting of parameters is based on the correlation with the presence of a formal reference between documents. The measure of semantic similarity expresses the existence of two common terms, phrases and word combinations. Optimal parameters of the algorithm for identifying implicit links are selected on the thematic collection by maximizing the correlation of explicit and implicit connections. The authors proposed and partially tested the hypothesis of the proximity of algorithm’s optimal parameters for calculating implicit references in various textual corpora and subject areas. Based on the results of processing a multilingual collection of texts in a specific subject area, optimal parameters for the calculation algorithm of the ISTS will be found, then this algorithm will be applied to texts of different subjects with expert refinement of optimal parameters. The proposed index of the ISTS will improve the information retrieval of meaningful texts and primary sources, automatic retrieval of documents from the Internet environment will eliminate the laboriousness of manual evaluation of the significance of texts. Cross-lingual definition of semantic textual similarity is an important step for detection and evaluation of cross-lingual plagiarism, research in this direction is rare. In addition, the project involves the following tasks: automated assessment and improvement of machine translation systems; cross-cultural analysis of communicative strategies; development of intelligent Internet technologies; increase the effectiveness of semantic search by accurately simulating the similarity of the meaning of sentences; Identification of unified terms and phrases for several languages; ranking of cross-language pairs of words by their semantic similarity or connectedness; automated formation of multilingual thesauri and interactive subject-oriented encyclopedias. The methodology was partially tested by the project participants in the creation and analysis of a collection of scientific articles on computer graphics and presented at the CyberWorlds 2017 conference (Great Britain). The methodology was also partially tested by the project participants in the encyclopedia of key concepts KEYWEN, which carries out the directed extraction of encyclopaedic information from the Internet. The project is based on the DECL toolkit developed and developed by the applicants, used in the construction of logical analytical systems (DIES, Crime, Summary, Antiterror) and semantic-oriented systems of knowledge extraction (Semantix, etc.). Project participants: Khakimova A.Kh. (Project Manager), Zolotarev O.V., Klokov A.A., Kuznetsov K.I., Maravin A.A., Potapova Z.E., Protasov V.I., Rodina I.V., Sokolov E.G., Charnine M.M.

18-07-01111 «Research and development of linguistic and statistical methods and algorithms of automatic generation of multilingual associative and hierarchical portrait of the subject area for supplement of ontologies, for discowering of significant documents and promising directions». Project Manager Zolotarev O.V.
The project is aimed at solving the fundamental scientific problem of semantic modeling, within the framework of which a technique is developed to automatically identify translation links (translational correspondences), as well as hierarchical, synonymous and associative links from Internet texts and build Multilingual Linguistic-Statistical Associative Hierarchical Portraits of Various Subject Areas (MAHPSA), in particular, on autonomous uninhabited underwater vehicles (ANPA). Accounting for multilingual and dissimilar resources allows you to get a more complete picture of what is happening in the subject area, to identify the sources of origin of ideas, the speed and direction of their dissemination, to identify relevant documents and prospective directions. The solution of the problem is based on an integrated approach combining statistical methods, corpus linguistics and distributive semantics, and is implemented in a technology that involves the development of linguistic-statistical mechanisms for the formation of a multilingual associative portrait of the subject area (MAHPSA), which is a dictionary of significant domain terms, the elements of which are organized into synonymic series (synsets), including translation correspondences, as well as associative and hierarchical links. MAHPSA is created automatically on the basis of statistical analysis of large volumes of texts from the Internet. The hierarchical links included in MAHPSA form a polyhierarchy and classifier that facilitate searching and navigation in the ANPA’s multilingual subject area. The proposed methodology also includes the integration of various MAHPSAs with multilingual linguistic resources (WordNet, Wikipedia, BabelNet, etc.) to obtain the largest multilingual ontology with relevant knowledge and improved coverage of terminology in the subject domains. The combined (integral) ontology contains a hierarchy of synonyms (synsets) from multilingual terms, including Russian, and serves as the basis for constructing a single multilingual vector space that allows one to evaluate the semantic proximity of multilingual texts, syncets and terms, similar to NASARI and MAFFIN. Translational correspondences between the multilingual synonyms of MAHPSA are built using Word2Vec technology. Integral ontology allows you to calculate the integral, multilingual statistics and trends in the use of terms and ideas, which allows you to predict the spread of ideas between languages ​​and determine promising directions. The measure of the semantic proximity of multilingual documents makes it possible to identify implicit references between documents and to identify meaningful documents, which is necessary for gathering qualitative information from the open Internet and building large, relevant multilingual corpuscles of the subject domain. Thus, the increase in the size and quality of the integral ontology will make it possible to build a better measure of similarity and subject matter of texts, the extraction of knowledge from which in turn will further increase the size and quality of the integral ontology. Based on the hierarchy of categories, the texts of scientific articles of a number of subject areas (including the ANPA) are processed and trends are revealed for the use of new concepts and ideas that integrate knowledge of various languages ​​to determine promising directions. This technique allows solving a wide class of problems both in the field of cognitive semantics and in the field of information retrieval methods, since MAHPSA can, in most cases related to contextual search, replace or supplement the multilingual thesaurus / ontology of the subject domain, the compilation of which is manually a very laborious task. The methodology was partially tested in the KEYWEN key words encyclopedia developed by the authors of the project and carrying out the directed extraction of multilingual encyclopedic information from the Internet. The project also relies on the original DECL tool environment created and developed by the project applicants, which has found wide application in the construction of logical analytical systems (DIES, Crime, Summary, Antiterror) and semantic-oriented systems of knowledge extraction (Semantix, etc.). Project participants: Zolotarev O.V. (Project Manager), Galina I.V., Gurov A.S., Matskevich A.G., Popchenko O.V., Raskatova M.V., Rodina I.V., Tezadova F.M., Sharapova L.V., Charnine M.M.

19-07-00455 «Development of models, algorithms and software for solving the problems of safety and risk assessment at nuclear power plants in case of beyond-design accidents with emission of thermal neutron sources with low flux density». Project Manager Berberova M.A.
This project is aimed at developing models, algorithms and software for carrying out activities to improve safety and reduce risks in the design of new and operating nuclear power plants. The principal novelty of the project is the development of a methodical apparatus for the assessment of radiation risk at nuclear power plants in the most dangerous (beyond design) accidents with the release of thermal neutron sources with low flux density. Nuclear reactors based on the use of fission energy of heavy nuclei are powerful sources of gamma radiation and neutrons. The project is aimed at computer modeling and development of new methods, algorithms and software for solving problems of safety and risk assessment at nuclear power plants in the most dangerous (beyond design) accidents with emission of thermal neutron sources with low flux density. To implement the project, it is necessary: to develop a methodological approach for solving the problems of assessing the doses of external and internal exposure and assessing the damage to the population living around the NPP in the most dangerous (beyond design) accidents with the release of thermal neutron sources with low flux density; make calculations for the population, given its age composition. Based on these decisions, measures will be proposed to reduce the risk and improve the safety of nuclear power plants. Project participants: Berberova M.A. (Project Manager), Andreev V.V., Andreev N.G., Andreeva O.V., Zolotarev S.S., Ozdoeva A.Kh., Orekhova E.E., Tarasova N.P., Tarasova Yu.S., Chernyavsky K.I.

19-07-00844 «New methods of formation and application of multidimensional visual models for the representation, processing, analysis, interpretation and use of big data». Project Manager Zakharova A.A.
The identification of implicit regularities in data and the presentation of big data in an accessible form for further analysis is the ultimate goal of using various scientific methods in the field of fundamental research. The modern level of development of information technologies allows to raise the processing and visualization of big multisensory data to an entirely new level due to access to significant computing resources and to use all means of visual analysis. The problem of data analysis in the field of fundamental research is widely known and is actively solved with the help of existing statistical methods of processing, but such analysis is difficult in the case of an increase in the number of considered parameters, the study of complex nonlinear dependencies and at the junction of fundamental sciences with poorly formalized areas of knowledge. In such cases, the identification of cause-effect relationships, mechanisms and parameters of data interaction and forecasting factors remains an unresolved and highly demanded task, because it will control the quality and speed of the search for solutions. The search for points with the best matching of the compared parameters for the formulated problem in weakly formalized areas of knowledge is complicated by the complexity of representing many of the investigated parameters in numerical or logical form. In such cases, the use of visual analytics in the consideration of non-numerical factors helps to overcome semantic and ontological differences between related areas of knowledge and synthesize new knowledge that is not available with direct statistical processing. This project proposes a research of opportunities for processing publicly available data at the junction of fundamental sciences, visual analytics and information technologies with the subsequent creation of a methodology and a new generation tool for analyzing of big multisensory data. The purpose of this project is the development of a methodology and the mathematical apparatus that realizes it based on a set of methods and algorithms for structuring, processing and storing a large amount of multidimensional data of complex systems, as well as creating on its basis a system for processing and storing such data with the possibility of interactive visual analysis. The significance of the study is defined both by the systematization of approaches to the problem of developing high-performance storage and processing of multidimensional heterogeneous data, and in demonstrating the effectiveness of these approaches for optimizing all components of such software systems for a specific application task in the analysis of big multisensory data and specified requirements. The novelty of the project lies is the creation of a methodology for conflict-free integration and structuring of a large amount of heterogeneous data in poorly formalized tasks that provides the opportunity for more complete research and analysis. The resulting information structure will allow to exclude from the data stream those components which are redundant in the conditions of the formulated research objective. A methodology of adaptive analysis and aggregation of multidimensional data, which allows to obtain analysis results in the conditions of a changing set of sources, is proposed, as well as a method for estimating the state of a multifactor system based on synchronization and the research of independent data sources for the task of big multisensory data analysis. Practical realization of the prototype of the system in the form of a complex of visual models, methods and means of software support in the form of the tool of the analysis and visualization of multisensory parameters, estimation and prediction of the system functioning is proposed as an approbation of new models and methods in this project. Accordingly, it is possible to propose a software system that will help in presenting and understanding complex biological processes through the collection, aggregation and using of cognitive and visual analysis tools on heterogeneous multivariate information. Project participants: Zakharova A.A. (Project Manager), Zavyalov D.A., Islamov R.T., Klimenko V.S., Podvesovsky A.G., Nebaba S.G., Shklyar A.V.

19-07-00857 «Research and development of methods and algorithms for the semantometric evaluation of the impact of scientific articles in a multilingual and multidisciplinary environment by building the «Index of Integral Scientific Influence». Project Manager Charnine M.M.
The project is aimed at solving the fundamental scientific problem of semantic modeling, within the framework of which a methodology for assessing the quality of scientific articles is developed on the basis of the probabilistic impact model of a scientific article on citation and ideas in subsequent articles, and also on the basis of the idea model in the form of a set of key terms and similar in meaning phrases in a multilingual semantic space. The problem of assessing the quality of scientific products is becoming increasingly acute and new approaches to its solution are in demand when assessing the general vector of the development of world science, forecasting and planning of fundamental and applied research, assessing the contribution of individual researchers and scientific schools to the development of domestic and world science. Many existing methods for assessing the impact and quality of scientific articles are based on the use of citation counting, which is indicative only after a considerable period of time after publication. In addition, the commercialization of science has led to the possibility of abuse of metric indicators, not related to the quality of work. The use of current indicators of the effectiveness of research publications (Bibliometrics, Altmetrics, Webometrics, etc.) is based on the evaluation of the impact of the research document only on the basis of external data (data on authors and place of publication). There are not so many studies that include all the contents of documents. The proposed addition of science-metric and bibliometric indicators to the project involves computational semantic analysis of full-text publications as an advanced one. The proposed methodology uses a new indicator of the quality of the scientific article – the Index of Integral Scientific Influence (IINV), which is calculated automatically by implicit contextual references to the document and is related to the statistical probability of the expected appearance of direct bibliographic references. For the purpose of measuring cross-language semantic similarity, we create our own multilingual resource similar to BabelNet, the knowledge architect Keywen, structuring information based on megalommes and neural networks, which allows us to find more accurate sources of ideas regardless of language, to identify the earlier stage of the idea’s appearance, and also to detect it confirmation in multilingual sources. Interlanguage links have more weight (significance) than references within one language, in addition, interdisciplinary links have more weight than references within one discipline. By processing multilingual information in different scientific fields, we obtain an integral (multilingual, interdisciplinary) statistics that allows us to more accurately predict the dynamics of explicit and implicit citation of ideas, phrases and documents. The semantic similarity of texts is determined with the help of grammatical transformations, translation programs and substitutions of synonyms, author’s method of constructing an associative portrait of the subject domain, Word2Vec method and neural networks for revealing similarity of terms and phrases on the basis of associative links. The probabilistic model of the dependence of the number of direct citations on the number of implicit references and their parameters is built on the basis of a linguistic processor that detects implicit references that can be tuned using the machine learning method. The solution of the problem is based on an integrated approach that combines statistical methods, neural networks, corpus linguistics and distributive semantics, and is implemented in technology that involves the development of linguistic and statistical mechanisms for the formation of IINV. Such a technique allows solving a wide class of problems, both in the field of cognitive semantics and information retrieval, for example, searching for ideas, assessing the quality of scientific articles, and ranking sites. In addition, the project involves the following tasks: monitoring new ideas and assessing their prospects; analysis of the continuity of scientific ideas; detection of interlanguage text borrowing; development of intelligent Internet technologies; detection and allocation of quality information from a multilingual Internet space. The methodology was partially tested in the knowledge architect KEYWEN, a software complex that provides targeted retrieval of meaningful information from the Internet environment. The project is based on: author’s development – the DECL tool environment, which represents the found hierarchical and associative links in the form of an ontology for comparing hierarchical structures and links in different languages; on the development of detecting interlingual semantic similarity on the basis of megalommes; on the linguistic processor BREF, created by the author collective, automatically allocating bibliographic references from separate scientific publications. Project participants: Charnine M.M. (Project Manager), Gurov A.S., Klokov A.A., Kuznetsov K.I., Protasov V.I., Rodina I.V., Sokolov E.G., Khakimova A.Kh.

19-07-01167 « Development of methods of intellectual analysis of scientific publications for the monitoring and long-term forecast of the priority directions of development of preventive and personalized medicine». Project Manager Khakimova A.Kh.
At present, the problem of developing methods for determining promising trends in scientific research and technology has become an essential component of the process of forming a national science policy. Due to a significant increase in the volume of publications in the medical sciences, the tracking of research evolution and forecasting future research trends are of great importance for specialists. The project is aimed at solving the fundamental scientific problem of semantic modeling using neural networks, within which a method for analyzing scientific data has being developed as a tool of predictive analytics (prediction of future trends and promising areas of research, identification of hot topics, assessment of forecast accuracy). To solve this problem, an interdisciplinary complex methodology is developed involving neural networks, probabilistic models, statistics, machine learning, corpus linguistics, distributive semantics, and visual analysis. The goal of the project is to develop the theoretical and technological foundations of creating a predictive analytics tool for scientific publications based on deep machine learning methods using neural networks and multilingual ontology containing a hierarchy of terms/concepts using topic modeling methods. The predictive analytics tool is designed to identify promising areas of research (research fronts, hot topics) and build long-term forecasts of their future trends, including an assessment of their accuracy. The scientific novelty of the proposed project is the development of new methods for calculating trends in scientific fields based on advanced methods of physical and mathematical modeling and the latest computer programs, including thematic modeling methods, semantic vector representation of texts, recurrent neural networks, probabilistic models, statistics, machine learning, corpus linguistics, distributive semantics, visual analysis. This will increase the speed, accuracy and reliability of future trend forecasts. The scientific novelty of the project is also in the formation of a new toolkit: based on modern neural network and topic modeling, featuring an ontological approach that allows for the dynamic identification of new relevant vocabulary. The scientific significance of the project includes the development of original mathematical and linguistic predictive models for the analysis of multilingual scientific texts in the field of preventive and personalized medicine in order to identify promising areas of research. The use of such computational models: – will provide an opportunity to use the Internet as a current text collection; – will allow to overcome language barriers on the basis of the model of representation and formalization of ideas in the form of a set of terms in a multilingual semantic space; – allows to use a multilingual ontology as a lexical basis dynamically updated using neural networks. The predictive analytics developed within the project, including a long-term forecast of future trends and an assessment of its accuracy, will allow: – identify sustainable trends and directions of development in the early stages of their occurrence; – explore the features of a new emerging trend (by measuring their impact on the accuracy of the forecast); – explore the dynamics of scientific developments and the emergence of new research directions. Project participants: Khakimova A.Kh. (Project Manager), Zolotarev O.V., Klokov A.A., Kuznetsov K.I., Maravin A.A., Rodina I.V., Sokolov E.G., Charnine M.M.