Photonics Research, Volume. 11, Issue 8, 1408(2023)

FreeformNet: fast and automatic generation of multiple-solution freeform imaging systems enabled by deep learning

Boyu Mao1, Tong Yang1,2、*, Huiming Xu1, Wenchen Chen1, Dewen Cheng1,3, and Yongtian Wang1,2
Author Affiliations
  • 1Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
  • 2Beijing Key Laboratory of Advanced Optical Remote Sensing Technology, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
  • 3e-mail: cdwlxk@bit.edu.cn
  • show less

    Using freeform optical surfaces in lens design can lead to much higher system specifications and performance while significantly reducing volume and weight. However, because of the complexity of freeform surfaces, freeform optical design using traditional methods requires extensive human effort and sufficient design experience, while other design methods have limitations in design efficiency, simplicity, and versatility. Deep learning can solve these issues by summarizing design knowledge and applying it to design tasks with different system and structure parameters. We propose a deep-learning framework for designing freeform imaging systems. We generate the data set automatically using a combined sequential and random system evolution method. We combine supervised learning and unsupervised learning to train the network so that it has good generalization ability for a wide range of system and structure parameter values. The generated network FreeformNet enables fast generation (less than 0.003 s per system) of multiple-solution systems after we input the design requirements, including the system and structure parameters. We can filter and sort solutions based on a given criterion and use them as good starting points for quick final optimization (several seconds for systems with small or moderate field-of-view in general). The proposed framework presents a revolutionary approach to the lens design of freeform or generalized imaging systems, thus significantly reducing the time and effort expended on optical design.

    1. INTRODUCTION

    Optical design and imaging optics play an important role in technological and social development. In their long history, imaging systems have mainly consisted of spherical and aspherical elements because of their rotational shape and ease of fabrication; however, their aberration correction ability is limited, particularly in nonsymmetric systems. To overcome the limitations of traditional spherical and aspherical systems, nonrotational symmetric freeform optical surfaces can be used, which improve system performance and specifications while reducing the volume and number of elements. The use of the freeform optical surface is considered to be a revolution in imaging optical design [1,2]. In the last 15 years, the development of advanced fabrication technologies has promoted the use of freeform optics in many fields, such as astronomical telescopes [3], head-mounted and head-up displays [48], cameras [9,10], off-axis imagers [11,12], and imaging spectrometers [1315].

    Advanced freeform surfaces may improve the performance of imaging optics but also further increase design difficulty and time cost significantly because of the complexity of the surface shape and nonsymmetric system structure, in addition to the limitation of the scarcity of existing reference systems and the difficulty of understanding freeform optics. Traditional optical design or lens design methods usually start with finding a proper starting point and then performing multiparameter optimization. The starting point is typically obtained by searching literature or accessing the lens databases within optical design software. However, this process takes a lot of time and may not find a feasible starting point, especially for the freeform optical systems. Without a good starting point, the design process will rely on extensive human effort and design skills, and extensive time may be spent on tedious trial and error, especially for beginners in optical design with limited or no knowledge of aberration theory or other design experiences. Nodal aberration theory has been used to guide the design and optimization of freeform imaging systems [1618]. Direct or point-by-point design methods have been proposed to construct systems based on given design requirements [1925]. However, these methods also have limitations on the design efficiency, simplicity, and generality, especially for a system with advanced system specifications such as wide field-of-view. The above-mentioned design methods need to be tailored to a certain design task. For other design tasks, the methods may need to be reapplied, and even the optimization strategy may need to be adjusted accordingly. In addition, the time cost for a single design task is high. Deep learning (DL) can be considered as a solution to these issues, as it can effectively summarize design knowledge and apply this knowledge to design tasks with a wide range of system and structure parameters. In 2019, Côté et al. used DL to obtain lens design databases to produce high-quality starting points for coaxial spherical objectives [26]. Then, this was improved by the introduction of more design forms [27,28]. However, the above framework is limited to coaxial spherical systems. In 2019, Yang et al. [29] proposed the preliminary design framework of freeform reflective imaging systems, which Chen et al. [30] improved by increasing the range of system specifications. However, the design method is limited because only one solution can be generated, which may not be optimal to fulfill the design requirements for system structures. It is of great significance while challenging to realize ultrafast generation of multiple-solution freeform imaging systems based on given design requirements on system and structure parameters, which can dramatically improve optical design efficiency and reduce human effort.

    In this study, we propose a deep-learning framework for the intelligent design of freeform imaging optics. The design framework is viable for generalized off-axis reflective, refractive, and catadioptric systems with multiple freeform surfaces. We generate the training data set automatically using a combined sequential and random freeform system evolution method. We further propose a special feedback strategy and use it to improve diversity of the systems in the data set. We also combine supervised learning and unsupervised learning based on freeform surface differential ray tracing to obtain a neural network with high performance. The generated network, which we call FreeformNet, enables the fast generation (less than 0.003 s per system) of multiple-solution freeform imaging systems after we input the design requirements that include the system and structure parameters. We can filter and sort the output systems based on a given criterion and use them as good starting points for the quick final optimization (several seconds for systems with small or moderate field-of-view in general).

    We used the design of a freeform off-axis three-mirror imaging system with wide range and advanced system specifications to demonstrate the effect of FreeformNet. The efficiency of the freeform optical design improved significantly and human effort was minimized. The proposed framework provides a new and generalized approach for complicated imaging optical design, thus facilitating the development of revolutionary and generalized optical design software.

    2. METHODS

    A. Overall Design Framework

    The design framework uses a deep neural network (DNN) to generate the freeform systems. The DNN has multiple hidden layers between the input and output layers, which can achieve a complex mapping relationship from the input space to the output space. For the starting point generation task of freeform imaging optical systems, the inputs are specific system parameters and structure parameters, and the outputs are one or more freeform surface imaging systems that meet the design requirements, which can be used for subsequent optimization. This can be considered as design knowledge, as the network “knows” the systems corresponding to the design inputs. These knowledges are obtained through combined supervised and unsupervised learning during DNN training. Figure 1 shows the entire design framework. For supervised learning, a considerable number of freeform systems with different system and structure parameters, in addition to good imaging performance, are required as the training data set. A combined sequential and random freeform system evolution method is proposed and used to automatically generate the fundamental data set for the pretraining of the DNN in supervised learning mode. A feedback strategy is further used to enrich the data set and improve the supervised training result of the DNN. Unsupervised learning based on differential ray tracing is then introduced. Combinations of various system and structure parameters are input into the DNN to obtain the output systems, and their imaging performance and constraints are integrated into the total loss function. Supervised and unsupervised trainings are combined and cooperate to obtain the final DNN with good performance and generalization ability. The use of a feedback strategy and unsupervised learning reduce the pressure of obtaining extensive systems for constructing data sets. When network training is completed, the DNN can be considered to have enough knowledge to output systems: one or more systems can be output quickly and directly according to certain design requirements, and the systems can be regarded as good design starting points for quick further optimization.

    Whole optical design framework based on deep learning.

    Figure 1.Whole optical design framework based on deep learning.

    B. Fundamental Data Set Generation

    A freeform imaging optical system can be characterized by system parameters, structure parameters, and surface parameters. System parameters describe the field-of-view (FOV), aperture value, and imaging size. Structure parameters include the position and tilt angle of each surface, and surface parameters describe the shape of each surface in the optical system. As a possible network framework, the network accepts system parameters as input and then outputs structure and surface parameters to form only one specific system. Thus, one-to-one mapping can be realized. However, the imaging performance of this specific output system is not optimal in general, and the system structure may not meet the designer’s needs. Consequently, the application of this framework is limited. To provide designers with more systems that enable choice and can be filtered or sorted, it is necessary to achieve one-to-multiple mapping, that is, multiple systems should be output when one system parameter combination (or a combination of selected system and structure parameter values) is input. To achieve this goal, structure parameters can also be included in the network input. For the above-mentioned complex one-to-multiple freeform imaging system design framework, the number of input and output parameters is large, and the ranges of the input system and structure parameter values are wide; hence, to obtain a good one-to-multiple network model, previous experience is essential to guide machine learning, particularly in the early phase of the network training process. Therefore, supervised learning should be used, and high-quality freeform systems with combinations of various system and structure parameter values have to be obtained as the “labeled” data set.

    The fundamental data set generation framework is shown in Fig. 2. Before the data set is generated for the design framework of one type of system, several representative system parameters that can fully describe the system specifications should be selected. For freeform imaging optical systems, the FOV, effective focal length (EFL), F-number (F#), and entrance pupil diameter (ENPD) are typically used; however, these parameters are not necessarily used simultaneously because some of them are correlated. Additionally, some parameters can be fixed to a specific value (e.g., EFL), and other systems with various parameter values can be obtained by scaling. Thus, the dimension of the system parameter space, in addition to the number of systems in the data set, can be reduced. The system parameter combination can be expressed as the vector Φ=[Φ1,Φ2,,Φm,,ΦM], where Φm denotes a specific parameter type, such as FOV and ENPD. For example, the system parameter combination Φ of the design example given in Section 3 is [XFOV, YFOV, ENPD], where XFOV and YFOV represent the FOV in x and y directions, respectively. We use a vector φ=[φ1,φ2,,φm,,φM] to denote the specific value of the vector Φ, where φm is value of parameter Φm. Similarly, the surface shape of a specific system can be characterized by the vector X=[X1,X2,,Xv,,XV], where Xv denotes a specific surface parameter (e.g., surface coefficient). The type of freeform surface is not limited. The locations and tilts of the freeform surfaces can be considered as structure parameters. In this study, only the common case in which the system is symmetric about the YOZ plane is considered. The y and z coordinates of the vertices of the surfaces are considered as structure parameters. If the tilt term (y term) is not used in the surface expression, the partial derivative of y at the surface vertex will be zero, and the surface normal direction at the vertex will coincide with the local z axis of the surface. If the chief ray of the central field is further required to intersect each surface at its vertex, then the surface normal as well as the tilt angle of this surface can be calculated using the coordinates of the surface vertices of this surface and the surface preceding and after this surface. Therefore, the surface tilt angle can be not added to the input structure parameters, and it can reduce the complexity of DNN training. For the output structure parameter, as chief ray of the central field may not intersect each surface at its vertex exactly for the systems generated by the DNN, the surface tilt angle is added as an output structure parameter in order to fully describe the freeform system. The structure parameter combination is denoted by Ψ=[Ψ1,Ψ2,,Ψt,,ΨT]. We use vectors χ and ψ to denote the specific value of the vector X and Ψ, respectively. In summary, as shown in Fig. 3, the full input parameters to the DNN are φ and ψ (excluding the surface tilt angles). The full output parameters are χ and ψ. When χ and ψ are obtained, along with the input φ, the output system is fully defined.

    Illustration of the fundamental data set generation and feedback strategy.

    Figure 2.Illustration of the fundamental data set generation and feedback strategy.

    Illustration of input and output parameters of DNN. The superscripts i, o, and tar denote input, output, and target, respectively. The output and target surface and structure parameters values are used to construct the mean square error (MSE) and then construct the supervised loss function Lsuper.

    Figure 3.Illustration of input and output parameters of DNN. The superscripts i, o, and tar denote input, output, and target, respectively. The output and target surface and structure parameters values are used to construct the mean square error (MSE) and then construct the supervised loss function Lsuper.

    Systems with various system and structure parameter values should be generated as the data set for supervised learning. After the characteristic parameters are selected, it is necessary to determine the appropriate parameter space for parameter values selection, which is a high-dimensional space for system and structure parameters. The system parameter space is abbreviated as SPΦ and the structure parameter space as SPΨ. For selected system parameters, the SPΦ’s corresponding SPΨ is vast; hence, it is impossible to fully sample and generate systems in this space. Additionally, the imaging performance of the many systems may be low and the system structure may be invalid (irregular or having extensive light obstruction). The solution is to divide the entire parameter space into subspaces. For different subspaces of the SPΦ, their corresponding subspaces of the SPΨ differ. The subspace of the SPΨ cannot be randomly selected either; otherwise, the above issues will also appear. To address this issue, we propose a method for determining the subspace by using a reference system with good imaging performance, whose φ and ψ are at the center of the system and structure parameter subspace, as shown in Fig. 4.

    Sketch of the parameter space and subspace pair SPΦ(i) and SPψ(i). A 2D space (only two system parameters and two structure parameters are considered, respectively) is plotted here for clarity, but actual parameter spaces should be high-dimensional spaces. Four subspaces are plotted here as an example. The subspaces plotted in same color form a subspace pair SPΦ(i)-SPψ(i). φ and ψ of reference system RSYS(i) are at the center of SPΦ(i) and SPψ(i).

    Figure 4.Sketch of the parameter space and subspace pair SPΦ(i) and SPψ(i). A 2D space (only two system parameters and two structure parameters are considered, respectively) is plotted here for clarity, but actual parameter spaces should be high-dimensional spaces. Four subspaces are plotted here as an example. The subspaces plotted in same color form a subspace pair SPΦ(i)-SPψ(i). φ and ψ of reference system RSYS(i) are at the center of SPΦ(i) and SPψ(i).

    After the range of φm  (1mM) is determined, this range is divided into Im segments, and the length of each segment is Lm. Thus, SPΦ is divided into I=Πm=1MIm subspaces SPΦ(i) (1iI) and each SPΦ(i) has a corresponding SPΨ(i). A subspace pair is denoted by SPΦ(i)-SPΨ(i), and all subspace pairs are combined to form the entire parameter space. Next, a reference system RSYS(i) whose system parameter is the center of the SPΦ(i) is generated for each subspace pair, and all the reference systems are automatically generated by a special system evolution method. The system with the lowest system specifications (easiest to be designed) is optimized first. During evolution, if the system whose system parameter value is φ* has been generated at a certain step, the weighted distances between φ* and the system parameter values of all the systems, which have not been optimized, are calculated. Here, the distance D=||W(φ*φ**)||2, where ||·||2 represents the L2-norm, represents elementwise multiplication, and φ** represent the system parameter value of one system, which has not been optimized. W=[W1,W2,,Wm,,WM] is the weight vector, which balances the influence of the different parameters. Among all the remaining systems, the system corresponding to the smallest D is determined to be the next system to be optimized, and it will be evolved from the system with φ*. The above process is repeated until all the reference systems have been generated. During sequential system evolution, if an abnormal system occurs (fatal ray tracing error, obstruction, or poor imaging performance), the current system is evolved from the second-nearest existing system. During optimization, the image quality, distortion, light obstruction, system structure, and some other constraints should be controlled. When RSYS(i) is generated, it is necessary to obtain its structure parameters ψRSYS(i). The length of the interval for each structure parameter values’ range in this subspace pair is given and represented by the vector R(i)=[R1(i),R2(i),,Rt(i),,RT(i)]. Consider ψRSYS(i) as the structure parameter center of the SPΨ(i). Then, the value range of ψt(i) can be determined by [ψRSYS,t(i)0.5Rt(i),ψRSYS,t(i)+0.5Rt(i)]; that is, SPΨ(i) is determined by ψRSYS(i) and R(i). R(i) should be determined before generating the reference system, and the structure of the reference system can be constrained to avoid the existence of structure with obstructions in the subspace, thus ensuring the reasonability of the structure parameter subspace division. Because of the difference in system size for different system parameters, R should differ for different subspace pairs. The larger the system parameters, the larger the structure parameters ranges are. The design form of the systems in the subspace is partially determined by the reference system, but the values of structure parameters vary within the subspace.

    Next, a random optimization generation method is used to generate the fundamental data set. In each SPΦ(i)-SPΨ(i), systems with random φ are generated using RSYS(i) as the initial system. During optimization, all the surface parameters of RSYS(i) are set as variables to obtain good imaging performance. To increase diversity of the structure parameter values in the data set, all the structure parameters may be changed to random values within the SPΨ(i) with probability Pf. Then, the changed structure parameters are frozen with probability Pf1 during optimization, whereas the unchanged structure parameters are frozen with probability Pf2. After optimization, the system parameters, structure parameters, and surface parameters of all the systems are obtained to form the fundamental data set. Because the system generation processes in different subspace pairs are independent, the above process can be performed in parallel to improve the efficiency of data set generation. Using the above method, systems with various system and structure parameters within large ranges, in addition to good imaging performance, are obtained and the required constraint is satisfied. The entire data set generation process is fully automatic; hence, human effort is minimal.

    C. Supervised Learning

    After the fundamental data set is obtained, the DNN can be trained using supervised training. Because the values and units differ significantly between different parameters, to improve the efficiency and convergence of training, all the input data with the same type are normalized to [1, 1] using a linear preprocessing method similar to minimum–maximum normalization. The output data of the network need reverse processing to obtain the actual structure and surface parameters of the system. The loss function Lsuper can be defined as the mean square error (MSE) between the target and actual predicted output (values of surface and structure parameters, χ and ψ), as shown in Fig. 3. After the loss is obtained using a feedforward calculation, the gradient of the Lsuper over all parameters in the neural network is calculated using the backpropagation algorithm. Then, the gradient is fed back to the optimizer and used to update the weights and biases to minimize the loss function, which makes the predicted values closer to the target values.

    At this time, the DNN can predict the corresponding systems for specific inputs. However, for a large DNN model, a large data set is required for supervised training. In the combined sequential and random freeform system evolution method mentioned above, all the systems in each subspace pair SPΦ(i)-SPΨ(i) are evolved from a single initial system RSYS(i). Additionally, the number of systems in the data set is small, and the values of different system and structure parameters are limited. A feedback training strategy (as shown in Fig. 2) is proposed to diversify the data set and improve the performance of the DNN. Random φ and ψ are selected in different subspace pairs as the DNN input. Then the systems predicted by the current network are directly input into the optical design software (e.g., CODE V and Zemax) for quick optimization. The structure parameters are frozen with probability Ps during optimization. If the optimized system has no ray tracing error but good image quality, the system parameters are obtained and added to the training data set. Further training is conducted, and system generation feedback is executed again. The above process is repeated, and the number and diversity of the systems in the data set are extended.

    D. Unsupervised Learning

    Through supervised learning, the DNN can converge quickly using the obtained data set and be used to output systems based on the given inputs. However, the network generalization ability may still be weak. Unsupervised learning is then introduced, and supervised and unsupervised learning are combined for subsequent DNN training.

    Unsupervised learning does not require a data set with “label information.” When random system and structure parameter combinations are input into the DNN, the output systems can be obtained, and the unsupervised loss function Lunsuper, which contains two parts, can be constructed. One part, Lperformance, is related to imaging performance. The other part, Lconstraint, is related to the design constraints that need to be satisfied, which are weighted penalty functions based on, e.g., quadratic, reciprocal, logarithmic, or higher-order power functions. Unsupervised learning can be regarded as training the DNN to “optimize” the systems generated by the DNN [27]. A differential ray tracing module of the freeform imaging system is essential to connect Lunsuper with the surface and structure parameters of the freeform system as well as the parameters of the DNN [31]. This ensures that the entire prediction and computation processes are fully differentiable; thus, the loss gradient can be computed and backpropagated to update the parameters of the DNN and improve DNN performance. The overall computing efficiency can be improved using GPU computing power and parallel computing. The core of ray tracing is to find the point where the ray intersects with the surface and then obtain the outgoing ray direction. The expressions of common freeform surfaces (XY polynomials surface, Zernike polynomials surface, etc.) in local coordinate systems can be written as follows: z=h(x,y)=c(x2+y2)1+1(1+k)c2(x2+y2)+i=0pAiφi(x,y),where c represents the curvature, k is the conic constant, φi(x,y) is the freeform surface term, and Ai is its coefficient. For a ray with a starting point coordinate μ=[μx,μy,μz] and a unit propagation direction vector ω=[ωx,ωy,ωz], it can be represented by (μ,ω). The coordinates of the ray (μ, ω) after propagating for λ units of length can be written as μ+λω. The intersection of this ray with the surface z=h(x,y) shall meet f(x,y,z)=h(x,y)z=f(μ+λω)=0.

    λ can be solved iteratively using Newton’s method: λ(n)=λ(n1)f(μ+λ(n1)ω)f(μ+λ(n1)ω)=λ(n1)f(μ+λ(n1)ω)f·ω.

    The above process is conducted iteratively until the change of λ is smaller than the allowable tolerance. After λ is obtained, we obtain the coordinate of the intersection μ+λω, which is also the start point μ of the ray reflected or refracted by the surface. The propagation direction ω after this surface can be calculated based on Snell’s law. Then, by repeating the above process, the ray tracing process can be conducted surface-by-surface sequentially until the ray reaches the image plane.

    The actually used rays of each field in the system are determined based on the location of the aperture stop and the chief ray of each field. Once the chief ray is found, other rays can be sampled according to the entrance pupil size and used for imaging performance analysis and optimization. As the freeform system is nonsymmetric, the traditional method of locating the chief ray by obtaining the paraxial entrance pupil position may be not suitable. Therefore, an iterative search method of the chief ray is adopted. The chief ray of each field should intersect with the aperture at its center. Trace a ray (μh,ωh) starting from the object space of the optical system with a field point ωh and obtain the coordinates τh of its intersection with the aperture stop in its local coordinate system. Here, the point where the ray intersects with a virtual plane in the object space can be taken as μh. This process of tracing and finding the intersection point can be expressed by τh=ε(μh), and its derivative ε(μch,(n)) can be calculated by automatic differentiation. The iteration process follows μch,(n+1)=μch,(n)τch,(n)ε(μch,(n)),where subscript c represents the chief ray, n is the number of iterations, and iteration can be stopped when τch,(n) is less than the allowable value. The x and y coordinates of the initial iteration ray’s starting point can be the same as the vertex coordinates of the first surface of the system.

    Loss functions can be constructed based on the result of ray tracing. Lperformance includes the loss related to the root mean square (RMS) spot size radius lspot and the loss related to distortion ldist. lspot is the average value of the RMS spot radius among the full FOV and different wavelengths: lspot=1H×Wh,w{1Pp[(xph,wxch,w)2+(yph,wych,w)2]}12,where x and y represent the local coordinates of the image point of a specific ray; the subscript c represents the chief ray; h, w, and p represent the index of the sampled field point, sampled wavelength, and sampled pupil ray, respectively; and H, W, and P are the total numbers of h, w, and p, respectively.

    The relative distortion of off-axis sampled field points is used to construct ldist: ldist=1H×Wh,w[(xch,wxidealh)2+(ych,wyidealh)2][(xidealh)2+(yidealh)2]1212,where the subscript “ideal” represents the ideal image point.

    Lconstraint includes constraints related to, for example, the system specifications (e.g., EFL and F#), light obstruction, system volume, and the chief ray of the central field. The difference between the output and input structure parameters should also be constrained to make the DNN work normally.

    For the calculation of EFL, the traditional method used for a rotational symmetric system is not feasible; hence, we use a method that uses real ray tracing. For the focal length EFLx in the x direction, the chief ray of a field with small angle Δθ relative to the central field in the x direction is traced, and the image height hx in the x direction relative to the central field can be obtained. Based on hx=EFLx×tanΔθ, EFLx can be calculated. The focal length EFLy in the y direction can also be calculated using the same method. The loss of EFLlEFL can be calculated using the MSE loss between the actual focal lengths and the required focal length EFL*: lEFL=12[(EFLxEFL)2+(EFLyEFL)2].

    Similar methods can be used to calculate other system parameters, such as the F# and magnification.

    Light obstruction should be eliminated in freeform imaging optical systems, particularly for reflective systems. It can be controlled by constraining the distances between the edge points of surfaces and the edge rays of light beams. The distances can be calculated using real ray trace data. When the loss lobs related to light obstruction is calculated, all the J key distances dj (a negative distance means that obstructions exist) are determined. If dj is greater than given tolerance dmin,j, the residual space is considered to be sufficient, and the contribution to the loss function should be zero; otherwise, a penalty value is added. The loss functions can be written as lobs=1Jj=1Jlobs,j,wherelobs,j=min(djdmin,j,0).

    Different from aspherical surfaces, using the off-axis section of the freeform surface will not improve the correction ability of nonsymmetric aberrations, as the on-axis section of the surface is also nonrotationally symmetric. The off-axis section of a freeform surface can be characterized by the on-axis section of another freeform surface. In addition, if the used area of the freeform surface is different from the mathematical vertex (the origin of the local coordinate system) of the surface, unnecessary troubles may appear during optomechanical design and system assembly. Therefore, for the design of freeform imaging systems, it is generally required that the chief ray of the central field intersects with each freeform surface at its vertex (the local coordinates should be zero). The local coordinates (xc1,s, yc1,s) of the chief ray of the central field with the sth surface (1sS) can be obtained using ray tracing. The loss function lc-ray is defined as lc-ray={1Ss[(xc1,s)2+(yc1,s)2]}12.

    As the DNN should output systems whose structure parameters are the same as or similar to the input, the difference between the input and output structure parameters should be controlled. The loss lstr is calculated using the root mean square error loss: lstr=[1Tt(STPtoSTPti)2]12,where ψti and ψto are the tth input and output structure parameter, respectively.

    Lunsuper can be constructed using the weighted sums of the above individual losses for all output systems, where wspot,wdist,wEFL,wobs,wc-ray, and wstr are the weights. Additionally, for systems with more advanced system specifications, more aberrations should be tolerated. An adjustment factor ρw is added to balance the loss contribution of different systems to Lperformance, and ρw decreases as the system specifications increase. The losses of different systems can be summed: Lunsuper=(ρwLperformance+Lconstraint)=[ρw(wspotlspot+wdistldist)+(wEFLlEFL+wobslobs+wc-raylc-ray+wstrlstr)].

    The total loss when supervised and unsupervised trainings are combined is Ltotal=Lsuper+wunsuperLunsuper.

    By regulating weight wunsuper, the contribution of unsupervised learning to overall network training can be modulated. Network training can be conducted based on Ltotal to obtain the final DNN called FreeformNet with good performance. The overall framework of combined supervised and unsupervised training is shown in Fig. 5.

    Illustration of the training mode of combined supervised and unsupervised training.

    Figure 5.Illustration of the training mode of combined supervised and unsupervised training.

    E. System Generation

    After FreeformNet is obtained, as shown in Fig. 6, for a specific design task, the values of all input system parameters and structure parameters can be provided according to the design requirements. In this case, the single design result tailored to the input parameters can be output immediately by FreeformNet. Another case is when the values of only some of the system parameters and structure parameters have been provided based on actual design needs. In this case, for parameters without designated values, the values can be randomly selected within the corresponding parameter range. Combined with the determined parameters, a certain number of different parameter value combinations can be obtained and input into FreeformNet. A series of systems that meet the basic design requirements can be generated, and a fast multisolution output with different structure and system parameters is realized. For a traditional or existing freeform system generation method using DNN, the design input is only limited to specific system parameters combinations, and only one system can be generated fulfilling the design requirements. Compared with the existing single-solution method, in our design framework, for given specific system parameters (or partial system and structure parameters), multiple solutions can be generated that fulfill the given fixed input parameters (basic design requirements) but having different values for other parameters. Therefore, the proposed method can be seen as a “multisolution” method. Based on the preset evaluation indicators (e.g., various imaging performance metrics and the system volume) and other constraints, the solutions can be filtered or sorted and ready for user selection. The systems generated by FreeformNet can be used directly as good starting points for further optimization. The systems output directly from the network can also be quickly optimized in parallel to generate output systems with substantially improved imaging performance, thus greatly improving the efficiency of freeform optical design. For systems with small or moderate FOV, the optimization can be done in several seconds for a single system in general. In this way, it only takes several seconds in total to obtain a freeform system with good performance after the design requirements are input into the FreeformNet. For systems with large FOV, the time cost for generating the starting point using FreeformNet is the same. However, the optimization difficulty is much larger if high imaging performance across the full FOV is required. Some trial-and-error and minor adjustments may be required during the optimization process in order to balance the aberrations. Therefore, the optimization time may be longer.

    Fast generation process of multiple-solution freeform imaging systems using FreeformNet.

    Figure 6.Fast generation process of multiple-solution freeform imaging systems using FreeformNet.

    3. RESULTS

    The freeform off-axis three-mirror imaging system design was used to verify the feasibility and effect of the proposed design framework and FreeformNet. The selected freeform off-axis three-mirror imaging system had a central field of (0°, 0°) and symmetry about the YOZ plane, with traditional zig-zag folding geometry, as shown in Fig. 7(a). M1, M2, M3, and IMG denote the primary mirror, secondary mirror, tertiary mirror, and image plane, respectively. The aperture stop was located at M2. For this system, the FOV in the x direction (XFOV), FOV in the y direction (YFOV), and ENPD were chosen to describe the system parameters; that is, Φ=[XFOV,YFOV,ENPD]. The focal length was set to a fixed value of 1 mm, and systems with other focal lengths were obtained by scaling. System structure parameters included the tilt angles and the y and z vertex coordinates of all the surfaces in the system. Because the origin of the global coordinate system was set to the vertex of M2, the y and z coordinates of the vertex of M2 were not included in Ψ. Therefore, Ψ=[M1y,M1z,M3y,M3z,IMGy,IMGz,M1tilt,M2tilt,M3tilt,IMGtilt] (note that the input structure parameter did not contain the tilt angles). The freeform surface type was an XY polynomial freeform surface up to the sixth order with no base conic for simplicity, and the odd-order terms of x were not used. Thus, X had 42 individual parameters. Overall, system parameters and partial structure parameters were considered as network inputs, and there were nine in total. All structure and surface parameters were considered as network outputs, and there were 52 in total.

    (a) The selected folding geometry of freeform off-axis three-mirror imaging system and its structure constraints. (b) Sketch of the concept of structure parameters range.

    Figure 7.(a) The selected folding geometry of freeform off-axis three-mirror imaging system and its structure constraints. (b) Sketch of the concept of structure parameters range.

    The next step was to determine the appropriate SPΦ. XFOV and YFOV were specified in the range of 4°–40°, and ENPD in the range 1/6–2/3 mm (F# in the range 1.5–6). However, because it was difficult to achieve a large etendue (for example, a large FOV and small F# simultaneously), while satisfying the design requirements and achieving good imaging performance, only system parameters that satisfied the following condition were selected: 0.5×XFOV+0.5×YFOV+20×ENPD33.34.

    In this way, the etendue will be effectively limited. During the training of the DNN, the training result for the systems at the edge of the system parameter space may be bad, as the number of systems in the data set whose system parameters are close to these systems at the edges is smaller, compared with the systems in the inner part of the parameter space. Therefore, to achieve a good overall training effect for the system parameters within the specified range, the entire system parameter space for network training was moderately larger than the system parameter space used for system generation. This can be done by extending the upper and lower limits of mth system parameter Φm by 0.5Lm (here, Lm is the length of Φm for each subspace, which is defined in Section 2.B). For example, for Φm, if the range of its value for system generation is aφmb, then, during the DNN training process, the range is enlarged to a0.5Lmφmb+0.5Lm. In this way, good imaging performance of the systems generated by the DNN can be obtained. Then, the SPΦ was divided into smaller subspaces. The ranges of XFOV, YFOV, and ENPD are segmented according to the lengths L1=2°, L2=2°, and L3=0.05, respectively. Therefore, I1=19, I2=19, and I3=11 were calculated, and the total number of subspace pairs was I=Πm=13Im=19×19×11=3971. After the subspace pairs whose central system parameters did not meet Eq. (13) were removed, 2598 subspace pairs remained.

    After the subspace pairs were divided, RSYS was generated using a special system evolution method. The optimization of the reference systems was completed using optical design software CODE V. As the system was symmetrical about the YOZ plane, only half of the full FOV was considered in the design. The six sampled field points (0, 0), (0, YFOV/2), (0, YFOV/2), (XFOV/2, 0), (XFOV/2, YFOV/2), and (XFOV/2, YFOV/2) were selected for each system. During system generation, the focal length, light obstruction, distortion, and intersection coordinates of the chief ray of the central field with the freeform surfaces were controlled and allowed for larger aberrations for systems with larger ENPD and FOV. The elimination of obstructions was achieved by controlling the five distances d shown in Fig. 7(a). The maximum acceptable relative distortion of the off-axis field in the x and y directions for each system is MaxDistortion(FOV,ENPD)=0.15×FOV+10×ENPD100.

    Let R(i)=A×HdimRSYS(i), where HdimRSYS(i) was the horizontal dimension (along the z direction) of the RSYS(i) and A=[A1,A2,,At,,AT], which was corresponding to the input Ψ, and the value of A was the same for all subspace pairs. Before generating the reference systems, the value of At was determined as A1=A2==A6=1/3. As the reference system was optimized, Hdim was constantly changing, resulting in R also changing until optimization was finished and the final value of R was determined. To minimize the structure with obstructions in the subspace pair, when generating the reference system, the distance between surfaces could be controlled according to the value of R to ensure, as much as possible, no overlap between color blocks in Fig. 7(b) (circles indicate each surface of the reference system; color blocks indicate the range of position for each surface). The evolution direction of the reference systems was determined by weight W, where the value was W=[1,1,12]. The reference systems were generated automatically, which took 0.83 h. In the process of evolution, the structure parameter values of all the reference systems were obtained, and all the subspace pairs were determined.

    Then, the full fundamental data set was generated using a random optimization generation method. The values of Pf,Pf1, and Pf2 were 0.6, 0.5, and 0.2, respectively. Twelve systems were generated in each subspace pair, and a total of 31,176 systems were obtained for 2598 subspace pairs, which took 10.2 h. By obtaining the system parameters, structure parameters, and surface parameters of these systems, the fundamental data set was formed.

    Next, the DNN was pretrained with the fundamental data set. The DNN used in this example had 20 hidden layers, and the largest number of nodes in one layer was 300. The activation function, optimizer, loss function, learning rate, and batch number were the tanh function, Adam, MSE, 104, and 50, respectively. All the data in the training data set were fed into the network. A total of 15,000 epochs were performed in pretraining, which took 2 h. The final loss function value was 2.67×103. Training was performed on a computer with an Intel Core i9-12900K CPU at 3.2 GHz, 64 GB of internal memory, and NVIDIA GeForce RTX 3090 Ti GPU. After pretraining, the feedback training strategy was further used. Each time the system generation feedback was executed, 1000 random parameter combinations were selected from different subspace pairs as the DNN input. The output systems were optimized, and the Ps value was 0.5. The good systems were then added to the data set for further training. After feedback training is completed, supervised training was continued for a period of time, and the learning rate decreased gradually as training progressed. A total of 147,000 epochs were trained, the number of systems in the data set grew to 109,703, which took 73.3 h, and the final loss function value achieved was 4.68×104. The performance of the model was tested using 2,000 random inputs. A total of 115 systems had ray tracing errors, 26 systems had obstructions, and the average RMS spot diameter of the remaining 1859 normal systems was 0.0044 mm.

    Next, unsupervised learning was introduced. As the structures with obstructions were avoided as much as possible when determining the subspace pairs, the loss function related to obstruction was not used during unsupervised training in this example. If the structure parameter range is relatively large, and there are inevitably many system structures with obstructions in other design cases, adding this loss function can be considered. The weights of each part of training loss were set as follows: wunsuper=5, wspot=4, wdist=0.05, wEFL=0.05, wc-ray=0.1, wstr=5, and ρw=0.01×(XFOV+YFOV)0.01×ENPD+1.1, and the batch number was adjusted to 5. Input data used for unsupervised training were not fixed. In each epoch, we randomly selected 500 subspace pairs and randomly generated a combination of input parameter values in each subspace pair as an input and then used these 500 inputs for unsupervised training. A total of 1210 epochs were performed in the combined supervised and unsupervised training, which took 81.5 h. The performance of the model was tested again and the same 2000 systems were predicted. The number of systems with a ray tracing error decreased to 61 and the average RMS spot diameter of the normal systems decreased to 0.0025 mm. Thus, unsupervised training significantly improved the performance of the DNN.

    The obtained network FreeformNet can quickly generate multiple-solution freeform imaging systems based on the design requirements (DNN input). To help the designer complete the above design task using FreeformNet, a corresponding program has been written and used. Designers can input the system’s entire or partial system and structure parameters based on the design requirements, and single or multiple systems can be generated. It is possible to choose whether to evaluate the system quality and whether to perform further software optimization. The output systems can be filtered or sorted based on a selected metric, such as the average RMS spot diameter, the maximum relative distortion of the sampled fields, volume of the system, and modulation transfer function, and are then ready for user selection.

    We used three different kinds of parameter inputs to evaluate the effect of the FreeformNet obtained above. In each case, 5000 inputs were tested. The predicted system focal length was expected to be 1 mm, and the predicted systems were not further optimized.

    In the first case, all system parameters and structure parameters were provided (all the parameter combinations were random values in the parameter space). A single system would be output corresponding to one input. Among the 5000 predicted systems, 138 systems had ray tracing errors, 43 systems had obstructions, and the average RMS spot diameter of the other 4819 normal systems was 0.0026 mm. Except for a few systems with relatively larger aberrations, the output systems can be taken as good starting points for further optimization. Figure 8 shows nine typical predicted systems, where SPO, DST, and VOL represent the average RMS spot diameter, the maximum relative distortion of the sampled fields, and system volume, respectively. The average RMS spot diameter and the maximum relative distortion of all the normal systems are shown in Fig. 9. If further optimization is conducted, it takes about several seconds in general to obtain good imaging performance for a system with narrow or moderate FOV.

    Typical predicted systems when system parameters and structure parameters were all provided.

    Figure 8.Typical predicted systems when system parameters and structure parameters were all provided.

    Average RMS spot diameter and maximum relative distortion of normal systems in case one (system numbers were arranged in ascending order according to the average RMS spot diameter).

    Figure 9.Average RMS spot diameter and maximum relative distortion of normal systems in case one (system numbers were arranged in ascending order according to the average RMS spot diameter).

    The second case was to provide all system parameters and partial structure parameters. The provided parameters were XFOV=24°, YFOV=16°, ENPD=1/3  mm (F#=3.00), and M1y=2.38  mm (note that M2y=0). Other structure parameters were set as random values within corresponding subspaces. Among the predicted systems, three systems had ray tracing errors, and no system had obstructions. The average RMS spot diameter of the other 4997 normal systems was 0.0010 mm. Among them, nine typical systems with various structure parameters are shown in Fig. 10. The systems with the smallest RMS spot diameter, smallest distortion, and smallest volume are shown in Figs. 10(a), 10(b), and 10(c) respectively. The average RMS spot diameter, the maximum relative distortion of the sampled fields, and system volume of all the 4997 normal predicted systems are given in Fig. 11.

    Typical predicted systems when all system parameters and partial structure parameters were provided.

    Figure 10.Typical predicted systems when all system parameters and partial structure parameters were provided.

    Average RMS spot diameter, maximum relative distortion, and system volume of normal systems in the second case (the system numbers were arranged in ascending order according to the average RMS spot diameter).

    Figure 11.Average RMS spot diameter, maximum relative distortion, and system volume of normal systems in the second case (the system numbers were arranged in ascending order according to the average RMS spot diameter).

    The third case was to provide partial system parameters. The provided parameters were XFOV=30° and ENPD=0.2  mm (F#=5.00). YFOV was a random value within 4°–28.68°, and all the structure parameters were set as random values within corresponding subspaces. Among the predicted systems, 113 systems with ray tracing errors, seven systems had obstructions. The average RMS spot diameter of the other 4880 normal systems was 0.0015 mm. Among them, nine typical systems with various structure parameters are shown in Fig. 12. For all the design examples given in Section 3, the power distribution on different mirrors of the output systems is dependent on the training data set and unsupervised learning. In this paper, during the system optimization included in the data set generation and the unsupervised learning process, the design mainly focuses on achieving high imaging performance, and the power distribution is not considered. For a freeform off-axis three-mirror design with a traditional zig-zag folding geometry and no intermediate image while M2 is taken as the aperture stop, the optical power of M1 is generally smaller than that of M3 (see Refs. [17,19,20,22,23,25]). If specific power distribution is required for the systems generated by the DNN, systems with various or specific power distribution types should be included in the training data set and unsupervised learning by adding specific design constraints during optimization. Furthermore, specific power distribution can be realized during final optimization on the generated starting points.

    Typical predicted systems when partial system parameters were provided.

    Figure 12.Typical predicted systems when partial system parameters were provided.

    The system and structure parameter combinations used for testing (created designs) are generated randomly within the parameter space, and are all different from the system and structure parameters used in the data set for network training. As the above design examples show, good starting points (corresponding to random testing system and structure parameters) can be generated by the network in most cases, which prove that the network generalizes. New systems are “learned” from the data set and unsupervised learning under the design framework and are not “memorized,” as they do not exist in the training data set (as well as the input during unsupervised learning).

    We also compared the proposed method with some typical design methods of freeform imaging systems. All the methods will be focused on the design of freeform off-axis three-mirror systems. The comparison results are listed in Table 1. For the method proposed in the manuscript, each system (starting point) can be generated in less than 0.003 s. The time cost can be further reduced if more different systems are predicted simultaneously. For example, predicting 1000 systems simultaneously takes 0.015 s, 1.5×105  s per system. Note that our method works for a system with narrow and wide FOV (the range of the system parameters can be wide). The networks can be actually and easily used or integrated into current optical design software and act as a powerful design tool or “database” for ultrafast system starting point generation. For other methods in the table, it is generally not applicable for systems with wide FOV (or not reported). In addition, for these design methods, the knowledge obtained in one design task can barely be transferred and used for other design tasks; thus, it is necessary to start from scratch using these methods when approaching a new design task. If the starting points are not designed directly but are searched and found from literatures, maybe tens of minutes or several hours will be needed, and there is a large probability that a feasible starting point cannot be found. Other report methods based on deep-learning are not applicable for the freeform system design nor applicable for the multiple-solution design.

    Comparison of Time Cost Using Different Design Methods

    MethodTime for System Generation with Narrow or Moderate FOVTime for System Generation with Wide FOV
    Starting PointWith Good Imaging Performance
    [19]Several minutes for system: FOV=3°×3°, EFL=60  mm, F#=2, LWIRNot reportedNot applicable or not reported
    [20]Several minutes for system: FOV=8°×8°, EFL=95  mm, F#=1.8, LWIR2.44 h for system: FOV=8°×8°, EFL=95  mm, F#=1.8, LWIR
    [23]Not reported. No individual starting point design processSeveral minutes for system: FOV=4°×4°, EFL=600  mm, F#=3,VIS
    [25]Not reported. No individual starting point design process5.9 min for system: FOV=3°×3°, EFL=60  mm, F#=1.5, LWIR
    [22]Not reported. No individual starting point design processAbout 30 min for system: FOV=8°×6°, EFL=50  mm, F#=1.8, LWIR
    [17]Not reported. A step-by-step design method based on nodal aberration theory: FOV=4°×4°, EFL=600  mm, F#=3, VISNot reported
    Finding systems from literaturesMaybe tens of minutes or several hours. A large probability that a feasible starting point cannot be found
    [2630]Not applicable for freeform system design or not applicable for multiple-solution design
    Our methodLess than 0.003 s per system for starting point generation for the above systems or systems with much wider FOV. If 1000 systems are simultaneously predicted, 1.5×105  s per system. If further optimization is conducted, it takes only about several seconds in general to obtain good imaging performance for a system with narrow or moderate FOV (similar with the cases shown in this table). For example, optimizing 20 different systems can be done in between 32 and 37 s, less than 2 s for each system

    If further optimization is conducted for the systems generated by the FreeformNet, it takes several seconds in general to obtain good imaging performance for a system with narrow or moderate FOV. Considering the ultrafast generation of the starting point, it only takes several seconds in total to generate a high-performance system after the design requirements are input into the FreeformNet. For example, for each design task corresponding to the first six reference design specifications in Table 1, 20 systems with different structure parameters are generated and optimized. The optimization of each group of 20 systems can be done in between 32 and 37 s, less than 2 s for each system. In conclusion, compared with existing methods, the method proposed in this paper realizes an ultrafast multiple-solution freeform imaging system generation, which works for a wide range of system and structure parameters. Further optimization can be done in several seconds for a system with narrow or moderate FOV. The efficiency of the freeform optical design improved significantly and human effort was minimized.

    4. CONCLUSIONS AND DISCUSSION

    In this paper, we proposed a design framework enabling fast generation of freeform imaging systems based on deep learning. The training data set is generated automatically using a combined sequential and random freeform system evolution method, and its diversity is further improved by a special feedback strategy. Supervised learning and unsupervised learning based on freeform surface differential ray tracing are combined to obtain a neural network with high performance. Given the required system and structure parameters as input, one or multiple solutions can be output by the trained model FreeformNet almost immediately and can be taken as good starting points for further optimization. If further optimization is conducted for the systems generated by the FreeformNet, it takes several seconds in general to obtain good imaging performance for a system with narrow or moderate FOV. Considering the ultrafast generation of the starting point, it only takes several seconds in total to generate a system with good performance after the design requirements are input into the FreeformNet. The design framework works for generalized freeform systems with advanced system specifications. Human effort for the complicated freeform system design task can be reduced to a minimum and the design efficiency is dramatically improved. This revolutionary design framework opens up a new pathway for the complicated lens design task and will promote the development of the next-generation optical design software. Future work will focus on the generalized one-to-multidesign framework generating systems with various and different kinds of structures and folding geometries as well as a different number of optical elements.

    Currently, the freeform surface type of the systems generated by the DNN is the same as the fixed surface type of the systems in the data set. However, the proposed design framework can be further extended to generate systems with different freeform surface types. To achieve this goal, systems using different surface types should be designed during the data set generation process. The surface type can be taken as one of the “parameters” of the system, and different surface types correspond to different values. This value is also taken as one of the inputs to the DNN. The loss function of systems using different surface types during supervised learning should be calculated individually and then summed. Another way is to train different DNNs corresponding to different kinds of surface types. During system generation, systems with different surface types can be generated by changing the surface type input to the DNN or using different DNNs. Then, the effects of different freeform surface types on imaging performance or other metrics can be compared, and the preferred surface type can be selected. Similarly, the proposed design framework can be further extended to the design of transmission optical systems. The refractive indices and Abbe numbers of lens materials can be taken as parameters of the systems and the input of the DNN. Systems using different lens materials (or fictitious glass model) should be designed during the data set generation process. Combining supervised and unsupervised learning, the DNN will be able to output different systems using different inputs of lens materials (or refractive indices and Abbe numbers). Then, for these systems, the imaging performance or other metrics can be compared, and better solutions can be selected. In this way, the replacement of lens materials can be realized to some extent.

    In addition, for current design framework, different DNNs are needed for system generation tasks with different folding geometries. For one folding geometry, the design examples given in Section 3 show that the obtained DNN has good generalization ability for a wide range of system and structure parameter values. This shows the effect and feasibility of the proposed design framework, including automatic data set generation and combined supervised and unsupervised learning. For other folding geometries (or other kinds of systems with different structures, such as systems with a real exit pupil), the same data set generation method can also be used to obtain the corresponding data set with the same basic system structure type and various system and structure parameter values automatically, and the DNN training process can be done in the same way. The corresponding DNNs can be generated easily and used to generate freeform systems. The data set generation and network training process for different kinds of systems can be done automatically and nonstop in computers, work stations, or cloud servers, in order to generate DNNs for these various kinds of systems. Future work will focus on the design framework enabling the exploration and generation of systems with different or best folding geometries, combining more complex training data set and generative adversarial network.

    Tools

    Get Citation

    Copy Citation Text

    Boyu Mao, Tong Yang, Huiming Xu, Wenchen Chen, Dewen Cheng, Yongtian Wang. FreeformNet: fast and automatic generation of multiple-solution freeform imaging systems enabled by deep learning[J]. Photonics Research, 2023, 11(8): 1408

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Imaging Systems, Microscopy, and Displays

    Received: Apr. 12, 2023

    Accepted: Jun. 7, 2023

    Published Online: Jul. 31, 2023

    The Author Email: Tong Yang (yangtong@bit.edu.cn)

    DOI:10.1364/PRJ.492938

    Topics