[1] D. Narayanan, M. Shoeybi, J. Casper, P. LeGresley, M. Patwary, V. A. Korthikanti, D. Vainbrand, P. Kashinkunti, J. Bernauer, B. Catanzaro, A. Phanishayee, M. Zaharia. Efficient large-scale language model training on GPU clusters using megatron-LM**. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1-15(2021)**.

[2] E. Strubell, A. Ganesh, A. McCallum. Energy and policy considerations for modern deep learning research**. Proc. AAAI Conf. Artif. Intell., 34, 13693-13696(2020)**.

[3] A. Chatelain, A. Djeghri, D. Hesslow, J. Launay, I. Poli. Is the number of trainable parameters all that actually matters?**(2021)**.

[4] N. C. Thompson, K. Greenewald, K. Lee, G. F. Manso. The computational limits of deep learning**(2020)**.

[5] G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. B. Miller, D. Psaltis. Inference in artificial intelligence with deep optics and photonics**. Nature, 588, 39-47(2020)**.

[6] B. J. Shastri, A. N. Tait, T. F. de Lima, W. H. P. Pernice, H. Bhaskaran, C. D. Wright, P. R. Prucnal. Photonics for artificial intelligence and neuromorphic computing**. Nat. Photonics, 15, 102-114(2021)**.

[7] H. Zhou, J. Dong, J. Cheng, W. Dong, C. Huang, Y. Shen, Q. Zhang, M. Gu, C. Qian, H. Chen, Z. Ruan, X. Zhang. Photonic matrix multiplication lights up photonic accelerator and beyond**. Light Sci. Appl., 11, 30(2022)**.

[8] X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, S. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, D. J. Moss. 11 TOPS photonic convolutional accelerator for optical neural networks**. Nature, 589, 44-51(2021)**.

[9] T. Wang, S.-Y. Ma, L. G. Wright, T. Onodera, B. Richard, P. L. McMahon. An optical neural network using less than 1 photon per multiplication**. Nat. Commun., 13, 123(2022)**.

[10] Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, M. Soljacic. Deep learning with coherent nanophotonic circuits**. Nat. Photonics, 11, 441-446(2017)**.

[11] A. N. Tait, T. F. de Lima, E. Zhou, A. X. Wu, M.-A. Nahmias, B. J. Shastri, P. R. Prucnal. Neuromorphic photonic networks using silicon photonic weight banks**. Sci. Rep., 7, 7430(2017)**.

[12] J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran, W. H. P. Pernice. All-optical spiking neurosynaptic networks with self-learning capabilities**. Nature, 569, 208-214(2019)**.

[13] F. Stelzer, A. Röhm, R. Vicente, I. Fischer, S. Yanchuk. Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops**. Nat. Commun., 12, 5164(2021)**.

[14] J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. S. Raja, J. Liu, C. D. Wright, A. Sebastian, T. J. Kippenberg, W. H. P. Pernice, H. Bhaskaran. Parallel convolutional processing using an integrated photonic tensor core**. Nature, 589, 52-58(2021)**.

[15] N. Mohammadi Estakhri, B. Edwards, N. Engheta. Inverse-designed metastructures that solve equations**. Science, 363, 1333-1338(2019)**.

[16] X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, A. Ozcan. All-optical machine learning using diffractive deep neural networks**. Science, 361, 1004-1008(2018)**.

[17] T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, Q. Dai. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit**. Nat. Photonics, 15, 367-373(2021)**.

[18] D. Verstraeten, B. Schrauwen, M. D’Haene, D. Stroobandt. An experimental unification of reservoir computing methods**. Neural Netw., 20, 391-403(2007)**.

[19] G. B. Huang, H. Zhou, X. Ding, R. Zhang. Extreme learning machine for regression and multiclass classification**. IEEE Trans. Syst. Man Cybern. B, 42, 513-529(2012)**.

[20] G. Van der Sande, D. Brunner, M. C. Soriano. Advances in photonic reservoir computing**. Nanophotonics, 6, 561-576(2017)**.

[21] D. Brunner, M. C. Soriano, C. Mirasso, I. Fischer. Parallel photonic information processing at gigabyte per second data rates using transient states**. Nat. Commun., 4, 1364(2013)**.

[22] Q. Vinckier, F. Duport, A. Smerieri, K. Vandoorne, P. Bienstman, M. Haelterman, S. Massar. High-performance photonic reservoir computer based on a coherently driven passive cavity**. Optica, 2, 438-446(2015)**.

[23] L. Larger, A. Baylón-Fuentes, R. Martinenghi, V. S. Udaltsov, Y. K. Chembo, M. Jacquot. High-speed photonic reservoir computing using a time-delay based architecture: million words per second classification**. Phys. Rev. X, 7, 011015(2017)**.

[24] D. Ballarini, A. Gianfrate, R. Panico, A. Opala, S. Ghosh, L. Dominici, V. Ardizzone, M. D. Giorgi, G. Lerario, G. Gigli, T. C. H. Liew, M. Matuszewski, D. Sanvitto. Polaritonic neuromorphic computing outperforms linear classifiers**. Nano Lett., 20, 3506-3512(2020)**.

[25] A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Dremeau, S. Gigan, F. Krzakala. Random projections through multiple optical scattering: approximating kernels at the speed of light**. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6215-6219(2016)**.

[26] J. Bueno, S. Maktoobi, L. Froehly, I. Fischer, M. Jacquot, L. Larger, D. Brunner. Reinforcement learning in a large-scale photonic recurrent neural network**. Optica, 5, 756-760(2018)**.

[27] P. Antonik, N. Marsal, D. Brunner, D. Rontani. Human action recognition with a large-scale brain-inspired photonic computer**. Nat. Mach. Intell., 1, 530-537(2019)**.

[28] A. Röhm, L. Jaurigue, K. Lüdge. Reservoir computing using laser networks**. IEEE J. Sel. Top. Quantum Electron., 26, 7700108(2020)**.

[29] U. Paudel, M. Luengo-Kovac, J. Pilawa, T. J. Shaw, G. C. Valley. Classification of time-domain waveforms using a speckle-based optical reservoir computer**. Opt. Express, 28, 1225-1237(2020)**.

[30] M. Miscuglio, Z. Hu, S. Li, J. George, R. Capanna, P. M. Bardet, P. Gupta, V. J. Sorger. Massively parallel amplitude-only Fourier neural network**. Optica, 7, 1812-1819(2020)**.

[31] S. Sunada, A. Uchida. Photonic neural field on a silicon chip: large-scale, high-speed neuro-inspired computing and sensing**. Optica, 8, 1388-1396(2021)**.

[32] M. Borghi, S. Biasi, L. Pavesi. Reservoir computing based on a silicon microring and time multiplexing for binary and analog operations**. Sci. Rep., 11, 15642(2021)**.

[33] X. Porte, A. Skalli, N. Haghighi, S. Reitzenstein, J. A. Lott, D. Brunner. A complete, parallel and autonomous photonic neural network in a semiconductor multimode laser**. J. Phys. Photon., 3, 024017(2021)**.

[34] D. Pierangeli, G. Marcucci, C. Conti. Photonic extreme learning machine by free-space optical propagation**. Photon. Res., 9, 1446-1454(2021)**.

[35] U. Teğin, M. Yıldırım, İ. Oğuz, C. Moser, D. Psaltis. Scalable optical learning operator**. Nat. Comput. Sci., 1, 542-549(2021)**.

[36] D. Pierangeli, G. Marcucci, C. Conti. Neuromorphic computing device using optical shock waves**. OSA Nonlinear Optics, NTh1A-3(2021)**.

[37] Z. Denis, I. Favero, C. Ciuti. Photonic Kernel machine learning for ultrafast spectral analysis**. Phys. Rev. Appl., 17, 034077(2022)**.

[38] M. Rafayelyan, J. Dong, Y. Tan, F. Krzakala, S. Gigan. Large-scale optical reservoir computing for spatiotemporal chaotic systems prediction**. Phys. Rev. X, 10, 041037(2020)**.

[39] S. Ortín, M. C. Soriano, L. Pesquera, D. Brunner, D. San-Martín, I. Fischer, C. R. Mirasso, J. M. Gutiérrez. A unified framework for reservoir computing and extreme learning machines based on a single time-delayed neuron**. Sci. Rep., 5, 14945(2015)**.

[40] R. Mirek, A. Opala, P. Comaron, M. Furman, M. Król, K. Tyszka, B. Seredyński, D. Ballarini, D. Sanvitto, T. C. H. Liew, W. Pacuski, J. Suffczyński, J. Szczytko, M. Matuszewski, B. Piętka. Neuromorphic binarized polariton networks**. Nano Lett., 21, 3715-3720(2021)**.

[41] A. Lupo, S. Massar. Parallel extreme learning machines based on frequency multiplexing**. Appl. Sci., 12, 214(2021)**.

[42] G. Marcucci, D. Pierangeli, C. Conti. Theory of neuromorphic computing by waves: machine learning by rogue waves, dispersive shocks, and solitons**. Phys. Rev. Lett., 125, 093901(2020)**.

[43] H. Zhang, M. Gu, X. D. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. H. Yung, Y. Z. Shi, F. K. Muhammad, G. Q. Lo, X. S. Luo, B. Dong, D. L. Kwong, L. C. Kwek, A. Q. Liu. An optical neural chip for implementing complex-valued neural network**. Nat. Commun., 12, 457(2021)**.

[44] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts. Learning word vectors for sentiment analysis**. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 142-150(2011)**.

[45] M. Belkin, D. Hsu, S. Ma, S. Mandal. Reconciling modern machine-learning practice and the classical bias–variance trade-off**. Proc. Natl. Acad. Sci. USA, 116, 15849-15854(2019)**.

[46] M. S. Advani, A. M. Saxe, H. Sompolinsky. High-dimensional dynamics of generalization error in neural networks**. Neural Netw., 132, 428-446(2020)**.

[47] S. Mei, A. Montanari. The generalization error of random features regression: precise asymptotics and double descent curve**. Commun. Pure Appl. Math., 75, 667-766(2020)**.

[48] K. Babić, S. Martinčić-Ipšić, A. Meštrović. Survey of neural text representation models**. Information, 11, 511(2020)**.

[49] A. Ashrafi. Walsh–Hadamard transforms: a review**. Advances in Imaging and Electron Physics, 1-55(2017)**.

[50] M. Soltanolkotabi, A. Javanmard, J. D. Lee. Theoretical insights into the optimization landscape of overparameterized shallow neural networks**. IEEE Trans. Inf. Theory, 65, 742-769(2019)**.

[51] D. Pierangeli, M. Rafayelyan, C. Conti, S. Gigan. Scalable spin-glass optical simulator**. Phys. Rev. Appl., 15, 034087(2021)**.

[52] D. Hesslow, A. Cappelli, I. Carron, L. Daudet, R. Lafargue, K. Müller, R. Ohana, G. Pariente, I. Poli. Photonic co-processors in HPC: using LightOn OPUs for randomized numerical linear algebra**(2021)**.

[53] M.-A. Miri. Integrated random projection and dimensionality reduction by propagating light in photonic lattices**. Opt. Lett., 46, 4936-4939(2021)**.

[54] T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient estimation of word representations in vector space**(2013)**.

[55] A. U. Rehman, A. K. Malik, B. Raza, W. Ali. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis**. Multimedia Tools Appl., 78, 26597(2019)**.