-
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis
Authors:
S. Nishio,
H. Nonaka,
N. Tsuchiya,
A. Migita,
Y. Banno,
T. Hayashi,
H. Sakaji,
T. Sakumoto,
K. Watabe
Abstract:
Machine learning is widely utilized across various industries. Identifying the appropriate machine learning models and datasets for specific tasks is crucial for the effective industrial application of machine learning. However, this requires expertise in both machine learning and the relevant domain, leading to a high learning cost. Therefore, research focused on extracting combinations of tasks,…
▽ More
Machine learning is widely utilized across various industries. Identifying the appropriate machine learning models and datasets for specific tasks is crucial for the effective industrial application of machine learning. However, this requires expertise in both machine learning and the relevant domain, leading to a high learning cost. Therefore, research focused on extracting combinations of tasks, machine learning models, and datasets from academic papers is critically important, as it can facilitate the automatic recommendation of suitable methods. Conventional information extraction methods from academic papers have been limited to identifying machine learning models and other entities as named entities. To address this issue, this study proposes a methodology extracting tasks, machine learning methods, and dataset names from scientific papers and analyzing the relationships between these information by using LLM, embedding model, and network clustering. The proposed method's expression extraction performance, when using Llama3, achieves an F-score exceeding 0.8 across various categories, confirming its practical utility. Benchmarking results on financial domain papers have demonstrated the effectiveness of this method, providing insights into the use of the latest datasets, including those related to ESG (Environmental, Social, and Governance) data.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Community Detection and Growth Potential Prediction Using the Stochastic Block Model and the Long Short-term Memory from Patent Citation Networks
Authors:
Kensei Nakai,
Hirofumi Nonaka,
Asahi Hentona,
Yuki Kanai,
Takeshi Sakumoto,
Shotaro Kataoka,
Elisa Claire Alemán Carreón,
Toru Hiraoka
Abstract:
Scoring patent documents is very useful for technology management. However, conventional methods are based on static models and, thus, do not reflect the growth potential of the technology cluster of the patent. Because even if the cluster of a patent has no hope of growing, we recognize the patent is important if PageRank or other ranking score is high. Therefore, there arises a necessity of deve…
▽ More
Scoring patent documents is very useful for technology management. However, conventional methods are based on static models and, thus, do not reflect the growth potential of the technology cluster of the patent. Because even if the cluster of a patent has no hope of growing, we recognize the patent is important if PageRank or other ranking score is high. Therefore, there arises a necessity of developing citation network clustering and prediction of future citations. In our research, clustering of patent citation networks by Stochastic Block Model was done with the aim of enabling corporate managers and investors to evaluate the scale and life cycle of technology. As a result, we confirmed nested SBM is appropriate for graph clustering of patent citation networks. Also, a high MAPE value was obtained and the direction accuracy achieved a value greater than 50% when predicting growth potential for each cluster by using LSTM.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
Community Detection and Growth Potential Prediction from Patent Citation Networks
Authors:
Asahi Hentona,
Takeshi Sakumoto,
Hugo Alberto Mendoza España,
Hirofumi Nonaka,
Shotaro Kataoka,
Toru Hiraoka,
Kensei Nakai,
Elisa Claire Alemán Carreón,
Masaharu Hirota
Abstract:
The scoring of patents is useful for technology management analysis. Therefore, a necessity of developing citation network clustering and prediction of future citations for practical patent scoring arises. In this paper, we propose a community detection method using the Node2vec. And in order to analyze growth potential we compare three ''time series analysis methods'', the Long Short-Term Memory…
▽ More
The scoring of patents is useful for technology management analysis. Therefore, a necessity of developing citation network clustering and prediction of future citations for practical patent scoring arises. In this paper, we propose a community detection method using the Node2vec. And in order to analyze growth potential we compare three ''time series analysis methods'', the Long Short-Term Memory (LSTM), ARIMA model, and Hawkes Process. The results of our experiments, we could find common technical points from those clusters by Node2vec. Furthermore, we found that the prediction accuracy of the ARIMA model was higher than that of other models.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.