MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

Rocha, Vanderson; Kreutz, Diego; Canto, Gabriel; Bragança, Hendrio; Feitosa, Eduardo

计算机科学 > 机器学习

arXiv:2507.10591 (cs)

[提交于 2025年7月11日 ]

标题： MH-FSF：克服特征选择评估中基准测试和可重复性限制的统一框架

标题： MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

Authors:Vanderson Rocha, Diego Kreutz, Gabriel Canto, Hendrio Bragança, Eduardo Feitosa

摘要：特征选择对于构建有效的预测模型至关重要，因为它可以降低维度并突出关键特征。然而，当前的研究往往受到基准测试有限和依赖专有数据集的困扰。这严重阻碍了可重复性，并可能对整体性能产生负面影响。为解决这些限制，我们引入了MH-FSF框架，这是一个全面、模块化且可扩展的平台，旨在促进特征选择方法的再现和实现。该框架通过协作研究开发，提供了17种方法（11种经典方法，6种领域特定方法）的实现，并能够在10个公开可用的安卓恶意软件数据集上进行系统评估。我们的结果揭示了在平衡和不平衡数据集上的性能差异，突显了需要考虑这些不对称性的数据预处理和选择标准的重要性。我们展示了统一平台在比较各种特征选择技术中的重要性，促进了方法的一致性和严谨性。通过提供这个框架，我们的目标是显著拓宽现有的文献，并为特征选择领域的新的研究方向铺平道路，特别是在安卓恶意软件检测的背景下。

摘要： Feature selection is vital for building effective predictive models, as it reduces dimensionality and emphasizes key features. However, current research often suffers from limited benchmarking and reliance on proprietary datasets. This severely hinders reproducibility and can negatively impact overall performance. To address these limitations, we introduce the MH-FSF framework, a comprehensive, modular, and extensible platform designed to facilitate the reproduction and implementation of feature selection methods. Developed through collaborative research, MH-FSF provides implementations of 17 methods (11 classical, 6 domain-specific) and enables systematic evaluation on 10 publicly available Android malware datasets. Our results reveal performance variations across both balanced and imbalanced datasets, highlighting the critical need for data preprocessing and selection criteria that account for these asymmetries. We demonstrate the importance of a unified platform for comparing diverse feature selection techniques, fostering methodological consistency and rigor. By providing this framework, we aim to significantly broaden the existing literature and pave the way for new research directions in feature selection, particularly within the context of Android malware detection.

评论：	11页；4图；5表；已提交至JBCS
主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 密码学与安全 (cs.CR); 性能 (cs.PF)
MSC 类：	68T01
ACM 类：	I.2
引用方式：	arXiv:2507.10591 [cs.LG]
	(或者 arXiv:2507.10591v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.10591

提交历史

来自： Diego Kreutz [查看电子邮件]
[v1] 星期五， 2025 年 7 月 11 日 17:53:37 UTC (88 KB)

计算机科学 > 机器学习

标题： MH-FSF：克服特征选择评估中基准测试和可重复性限制的统一框架

标题： MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： MH-FSF：克服特征选择评估中基准测试和可重复性限制的统一框架 显示英文标题

标题： MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： MH-FSF：克服特征选择评估中基准测试和可重复性限制的统一框架