Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market

Khubiev, Kasymkhan; Semenov, Mikhail

doi:10.25209/2079-3316-2025-16-1-83-130

定量金融 > 统计金融

arXiv:2503.08696 (q-fin)

[提交于 2025年3月5日 ]

标题：多模态股票价格预测：俄罗斯证券市场的案例研究

标题： Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market

Authors:Kasymkhan Khubiev, Mikhail Semenov

摘要：经典资产价格预测方法主要依赖于数值数据，如价格时间序列、交易量、限价订单簿数据和技术分析指标。然而，新闻流在价格形成中起着重要作用，因此开发结合文本和数值数据的多模态方法以提高预测准确性具有高度相关性。本文研究了利用结合蜡烛图时间序列和文本新闻流数据的多模态方法进行金融资产价格预测的问题。为该研究收集了一个独特的数据集，其中包括莫斯科交易所交易的176只俄罗斯股票的时间序列以及79,555篇俄语财经新闻文章。对于文本数据的处理，使用了预训练模型RuBERT和Vikhr-Qwen2.5-0.5b-Instruct（大语言模型），而时间序列和向量化文本数据则使用LSTM循环神经网络进行处理。实验比较了基于单一模态（仅时间序列）和两种模态的模型，以及各种文本向量表示的聚合方法。预测质量通过两个关键指标进行评估：准确率（价格变动方向预测：上涨或下跌）和平均绝对百分比误差（MAPE），该指标衡量预测价格与真实价格的偏差。实验表明，引入文本模态使MAPE值降低了55%。所得的多模态数据集对于金融领域语言模型的进一步适应具有价值。未来的研究方向包括优化文本模态参数，如时间窗口、情感和新闻消息的时序顺序。

摘要： Classical asset price forecasting methods primarily rely on numerical data, such as price time series, trading volumes, limit order book data, and technical analysis indicators. However, the news flow plays a significant role in price formation, making the development of multimodal approaches that combine textual and numerical data for improved prediction accuracy highly relevant. This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and textual news flow data. A unique dataset was collected for the study, which includes time series for 176 Russian stocks traded on the Moscow Exchange and 79,555 financial news articles in Russian. For processing textual data, pre-trained models RuBERT and Vikhr-Qwen2.5-0.5b-Instruct (a large language model) were used, while time series and vectorized text data were processed using an LSTM recurrent neural network. The experiments compared models based on a single modality (time series only) and two modalities, as well as various methods for aggregating text vector representations. Prediction quality was estimated using two key metrics: Accuracy (direction of price movement prediction: up or down) and Mean Absolute Percentage Error (MAPE), which measures the deviation of the predicted price from the true price. The experiments showed that incorporating textual modality reduced the MAPE value by 55%. The resulting multimodal dataset holds value for the further adaptation of language models in the financial sector. Future research directions include optimizing textual modality parameters, such as the time window, sentiment, and chronological order of news messages.

评论：	NSCF-2024，程序系统：理论与应用
主题：	统计金融 (q-fin.ST) ; 机器学习 (cs.LG); 计算金融 (q-fin.CP)
引用方式：	arXiv:2503.08696 [q-fin.ST]
	(或者 arXiv:2503.08696v1 [q-fin.ST] 对于此版本)
	https://doi.org/10.48550/arXiv.2503.08696
期刊参考：	http://psta.psiras.ru:8081/ru/2025/1_83-130
相关 DOI:	https://doi.org/10.25209/2079-3316-2025-16-1-83-130

提交历史

来自： Kasymkhan Khubiev [查看电子邮件]
[v1] 星期三， 2025 年 3 月 5 日 21:20:32 UTC (653 KB)

定量金融 > 统计金融

标题：多模态股票价格预测：俄罗斯证券市场的案例研究

标题： Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

定量金融 > 统计金融

标题： 多模态股票价格预测：俄罗斯证券市场的案例研究 显示英文标题

标题： Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：多模态股票价格预测：俄罗斯证券市场的案例研究