Transferable polychromatic optical encoder for neural networks

Choi, Minho; Xiang, Jinlin; Wirth-Singh, Anna; Baek, Seung-Hwan; Shlizerman, Eli; Majumdar, Arka

doi:10.1038/s41467-025-61338-4

计算机科学 > 计算机视觉与模式识别

arXiv:2411.02697 (cs)

[提交于 2024年11月5日 ]

标题：可迁移的多色光学编码器用于神经网络

标题： Transferable polychromatic optical encoder for neural networks

Authors:Minho Choi, Jinlin Xiang, Anna Wirth-Singh, Seung-Hwan Baek, Eli Shlizerman, Arka Majumdar

摘要：人工神经网络（ANNs）彻底改变了计算机视觉领域，提供了前所未有的性能。然而，这些用于图像处理的ANNs需要大量的计算资源，通常会阻碍实时操作。在本文中，我们展示了一种光学编码器，可以在图像捕获期间同时在三个颜色通道中进行卷积，有效地实现了ANN的几个初始卷积层。这种光学编码导致计算操作减少了约24,000倍，在自由空间光学系统中实现了最先进的分类准确率（约73.2%）。此外，我们的模拟光学编码器，针对CIFAR-10数据进行训练，可以无需任何修改转移到ImageNet子集High-10，并且仍然表现出中等准确率。我们的结果证明了混合光学/数字计算机视觉系统的潜力，其中光学前端可以预处理环境场景，以减少整个计算机视觉系统的能耗和延迟。

摘要： Artificial neural networks (ANNs) have fundamentally transformed the field of computer vision, providing unprecedented performance. However, these ANNs for image processing demand substantial computational resources, often hindering real-time operation. In this paper, we demonstrate an optical encoder that can perform convolution simultaneously in three color channels during the image capture, effectively implementing several initial convolutional layers of a ANN. Such an optical encoding results in ~24,000 times reduction in computational operations, with a state-of-the art classification accuracy (~73.2%) in free-space optical system. In addition, our analog optical encoder, trained for CIFAR-10 data, can be transferred to the ImageNet subset, High-10, without any modifications, and still exhibits moderate accuracy. Our results evidence the potential of hybrid optical/digital computer vision system in which the optical frontend can pre-process an ambient scene to reduce the energy and latency of the whole computer vision system.

评论：	21页，4图，2表
主题：	计算机视觉与模式识别 (cs.CV) ; 光学 (physics.optics)
引用方式：	arXiv:2411.02697 [cs.CV]
	(或者 arXiv:2411.02697v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2411.02697
期刊参考：	Nat Commun 16, 5623 (2025)
相关 DOI:	https://doi.org/10.1038/s41467-025-61338-4

提交历史

来自： Minho Choi [查看电子邮件]
[v1] 星期二， 2024 年 11 月 5 日 00:49:47 UTC (47,537 KB)

计算机科学 > 计算机视觉与模式识别

标题：可迁移的多色光学编码器用于神经网络

标题： Transferable polychromatic optical encoder for neural networks

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 可迁移的多色光学编码器用于神经网络 显示英文标题

标题： Transferable polychromatic optical encoder for neural networks

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：可迁移的多色光学编码器用于神经网络