Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

Lampos, Vasileios

计算机科学 > 机器学习

arXiv:1208.2873 (cs)

[提交于 2012年8月13日 ]

标题：使用统计学习方法检测大规模用户生成文本流中的事件和模式

标题： Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

Authors:Vasileios Lampos

摘要：大量文本网络流受到现实世界中出现的事件或现象的影响。社交网络形成了一个极好的现代范例，在其中非结构化的用户生成内容定期发布，并且在大多数情况下自由传播。本博士学位论文探讨了基于这些文本流内容推断现实生活中的事件信息——或一般模式的问题。我们展示了通过使用统计机器学习方法，自动分析社交媒体（特别是Twitter）上发布的文本内容，可以提取关于社会现象的有价值的信息，例如流行病或甚至降雨率。一个重要的中间任务涉及形成和识别特征，这些特征表征目标事件；我们在多种线性、非线性和混合推理方法中选择并使用这些文本特征，从而在应用的损失函数方面取得了显著良好的性能。通过进一步检查此丰富的数据集，我们还提出了提取各种情绪信号的方法，揭示情感规范（至少在社交网络人群中）如何随时间演变以及现实世界中发生的重大事件如何影响它们。最后，我们展示了一些初步发现，显示了这种文本信息的空间时间特性以及将其用于解决诸如预测投票意向等任务的潜力。

摘要： A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information - or patterns in general - about events emerging in real life based on the contents of this textual stream. We show that it is possible to extract valuable information about social phenomena, such as an epidemic or even rainfall rates, by automatic analysis of the content published in Social Media, and in particular Twitter, using Statistical Machine Learning methods. An important intermediate task regards the formation and identification of features which characterise a target event; we select and use those textual features in several linear, non-linear and hybrid inference approaches achieving a significantly good performance in terms of the applied loss function. By examining further this rich data set, we also propose methods for extracting various types of mood signals revealing how affective norms - at least within the social web's population - evolve during the day and how significant events emerging in the real world are influencing them. Lastly, we present some preliminary findings showing several spatiotemporal characteristics of this textual information as well as the potential of using it to tackle tasks such as the prediction of voting intentions.

评论：	博士论文，238页，9章，2个附录，58幅图，49张表格
主题：	机器学习 (cs.LG) ; 计算与语言 (cs.CL); 信息检索 (cs.IR); 社会与信息网络 (cs.SI); 应用 (stat.AP); 机器学习 (stat.ML)
引用方式：	arXiv:1208.2873 [cs.LG]
	(或者 arXiv:1208.2873v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.1208.2873

提交历史

来自： Vasileios Lampos [查看电子邮件]
[v1] 星期一， 2012 年 8 月 13 日 18:59:54 UTC (4,698 KB)

计算机科学 > 机器学习

标题：使用统计学习方法检测大规模用户生成文本流中的事件和模式

标题： Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 使用统计学习方法检测大规模用户生成文本流中的事件和模式 显示英文标题

标题： Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用统计学习方法检测大规模用户生成文本流中的事件和模式