Self-Supervised Log Parsing

Nedelkoski, Sasho; Bogatinovski, Jasmin; Acker, Alexander; Cardoso, Jorge; Kao, Odej

计算机科学 > 机器学习

arXiv:2003.07905v1 (cs)

[提交于 2020年3月17日 ]

标题：自监督日志解析

标题： Self-Supervised Log Parsing

Authors:Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao

摘要：日志在软件系统的开发和维护中被广泛使用。它们收集运行时事件，并允许跟踪代码执行，这使得多种关键任务成为可能，例如故障排除和故障检测。然而，大规模软件系统生成大量半结构化的日志记录，这对自动化分析构成了重大挑战。将带有自由格式文本日志消息的半结构化记录解析为结构化模板是进一步分析的第一步和关键步骤。现有方法依赖于特定于日志的启发式方法或手动规则提取。这些方法通常专门用于解析某些类型的日志，因此限制了性能分数和泛化能力。我们提出了一种称为NuLog的新解析技术，该技术利用自监督学习模型，并将解析任务形式化为掩码语言模型（MLM）。在解析过程中，模型以向量嵌入的形式从日志中提取摘要。这使得MLM的预训练可以与下游异常检测任务相结合。我们在10个真实世界日志数据集上评估了NuLog的解析性能，并将结果与12种解析技术进行了比较。结果表明，NuLog在解析准确性方面优于现有方法，平均达到99%，并且与真实模板的编辑距离最低。此外，进行了两个案例研究，以展示该方法在监督和无监督场景下的基于日志的异常检测能力。结果表明，NuLog可以成功用于支持故障排除任务。实现可在https://github.com/nulog/nulog获得。

摘要： Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major challenge for automated analysis. Parsing semi-structured records with free-form text log messages into structured templates is the first and crucial step that enables further analysis. Existing approaches rely on log-specific heuristics or manual rule extraction. These are often specialized in parsing certain log types, and thus, limit performance scores and generalization. We propose a novel parsing technique called NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling (MLM). In the process of parsing, the model extracts summarizations from the logs in the form of a vector embedding. This allows the coupling of the MLM as pre-training with a downstream anomaly detection task. We evaluate the parsing performance of NuLog on 10 real-world log datasets and compare the results with 12 parsing techniques. The results show that NuLog outperforms existing methods in parsing accuracy with an average of 99% and achieves the lowest edit distance to the ground truth templates. Additionally, two case studies are conducted to demonstrate the ability of the approach for log-based anomaly detection in both supervised and unsupervised scenario. The results show that NuLog can be successfully used to support troubleshooting tasks. The implementation is available at https://github.com/nulog/nulog.

主题：	机器学习 (cs.LG) ; 软件工程 (cs.SE)
引用方式：	arXiv:2003.07905 [cs.LG]
	(或者 arXiv:2003.07905v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2003.07905

提交历史

来自： Sasho Nedelkoski [查看电子邮件]
[v1] 星期二， 2020 年 3 月 17 日 19:25:25 UTC (1,216 KB)

计算机科学 > 机器学习

标题：自监督日志解析

标题： Self-Supervised Log Parsing

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 自监督日志解析 显示英文标题

标题： Self-Supervised Log Parsing

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：自监督日志解析