the first boundary condition B1,

In the first boundary condition B1, as shown in Figure 8A, left tip nodes 1 and 3 are pin-supported and bottom-right nodes 7 and 10 are subjected to downward unit load of 1 kN separately as different loading cases. In Equation (10), the action value is updated so as to minimize the difference between the sum of observed reward and estimated action value at the next state r(s)+maxaQ(s,a) and estimated action value at the previous state Q(s, a).

arXiv:1702.05532. Optim.

57, 219225. 10, 111124. Methods Eng. Figure 12. The removal sequence of members is illustrated in Figure 6B. Although nload! = 2 different removal sequences can be obtained in this way, only the better result with less total structural volume is provided in the RL+GE column in Table 3. (2013) explained that solving the quasi-convex symmetric optimization problem may yield highly asymmetric solution. Boundary condition B1 of Example 2; (A) initial GS, (B) removal sequence of members. Therefore, the topology just before the terminal state shall be a sub-optimal topology, which is a truss with 12 members as shown in Figure 5B. An Introduction to Genetic Algorithms. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. From this result, it is confirmed that the agent is capable of eliminating unnecessary members properly for a different-scale truss. Indiana Univ. (2018). J. Mecan.

Each grid is a square whose side length is 1 m. The intersection of bracing members is not connected. As shown in Figure 8B, the agent utilizes an reasonable policy to eliminate obviously unnecessary members connecting to supports at first, non-load-bearing members around the supports next, and members in the load path at last. Build. In this paper, a machine-learning based method combining graph embedding and Q-learning is proposed for binary truss topology optimization to minimize total structural volume under stress and displacement constraints. The proposed method is also comparable to GA with np = 200 in terms of proximity to the global optimum; RL+GE generally reached the feasible solutions with less total structural volume compared with the solutions obtained by GA.

In this study, = 0.99 is adopted, because cumulative reward indicating the amount of reduction of structural volume as a result of the action is much more important than the instant reward. Figure 9. Load and support conditions are randomly provided according to a rule so that the agent can be trained to have good performance for various boundary conditions. Multidiscip. (1989). 1, 419430. The proposed method for training agent is expected to become a supporting tool to instantly feedback the sub-optimal topology and enhance our design exploration.

Rev. To investigate performance of the agent for another loading condition, the structure with the loads as shown in Figure 6A, denoted as loading condition L2, is also optimized using the same agent.

embeddings

Q-learning. (2014).

Optim. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf (accessed April 23, 2020). doi: 10.1512/iumj.1957.6.56038, Bellman, R. (1961).

Blast-induced ground vibration prediction using support vector machine. Ohsaki, M. (1995). Comput. Learning representations by back-propagating errors. The size of embedded member feature nf is 100. A branch and bound algorithm for topology optimization of truss structures. Two pin-supports are randomly chosen; one from nodes 1 and 2 and the other from nodes 4 and 5. The one just before the terminal state is a sub-optimal truss; however, instability exists at the loaded node. [0, 1] is a discount factor; i.e., the action value becomes closer to expected cumulative reward as is larger, and conversely, the action value becomes closer to expected instant reward as is smaller. Comput. Optimal topologies of truss structures. The agent is trained and its performance is tested for a simple planar truss in section 4.1. The upper-bound displacement for each boundary condition is computed by multiplying 100 to the maximum absolute value of displacement among the all DOFs of the initial GS with the same loading and boundary conditions; hence, varies depending on the structure and the loading and boundary conditions. Multidiscip. doi: 10.1038/nature24270, Sutton, R. S., and Barto, A. G. (1998). doi: 10.1080/03052158608902532, Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Yu, Y., Hur, T., and Jung, J. Copyright 2020 Hayashi and Ohsaki. Training workflow utilizing RL and graph embedding. doi: 10.1016/0045-7825(89)90119-9. It is notable that the agent was able to optimize the structure with the unforeseen boundary conditions which the agent has never experienced during the training. Machine learning for combinatorial optimization of brace placement of steel frames. Once in 10-episode training, the performance of is tested for prescribed loading and boundary conditions. Trivial members close to pin-supports or placed upper-right or bottom-right that do not bear stress are first removed, and important members along the load path are removed afterwards. Achtziger, W., and Stolpe, M. (2009). doi: 10.1016/0020-7683(94)00306-H, Hayashi, K., and Ohsaki, M. (2019). 6:59. doi: 10.3389/fbuil.2020.00059. (1996). Eng. Cambridge, MA: MIT Press. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. These results imply that the proposed method is robust against randomness of boundary conditions and actions during the training. In the same manner as neural networks, a back-propagation method (Rumelhart et al., 1986), which is a gradient based method to minimize the loss function, can be used for solving Equation (11). doi: 10.1007/s00158-019-02214-w. Mitchell, M. (1998). Multidiscip. Genetic algorithm for topology optimization of trusses. In GA, a set of solutions are repeatedly modified using the operations such as selection, where superior solutions at current generation are selected for new generation, crossover, where the selected solutions are combined to breed child solutions sharing the same characteristics as the parents, and mutation, where the selected solutions randomly change their values with low probability. Similarly to loading condition L1 in Figure 5, several symmetric topologies are observed during the removal process, and the sub-optimal topology is a well-converged solution that does not contain unnecessary members. Comput. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). J. However, in order to create a more reliable agent, it is necessary to implement the training with various topology, geometry, and loading and boundary conditions. FDMopt: force density method for optimal geometry and topology of trusses. The datasets generated for this study are available on request to the corresponding author. It forms a very simple truss composed of six pairs of members connecting linearly.

30, 16161637. doi: 10.1109/TNN.1998.712192, Tamura, T., Ohsaki, M., and Takagi, J. Another approach may be to incorporate a rule-based method to create a hybrid optimization agent. If the cumulative reward is larger than the previous best score, at that step is saved. Machine learning prediction errors better than DFT accuracy. Optim. Following this scheme, the parameters are trained by solving the following optimization problem (Mnih et al., 2015): In Equation (11), the training can be stabilized by using parameters ~ at the previous state for estimation of the action value at the next state s (Mnih et al., 2015). Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (Las Vegas, NV), 770778. doi: 10.1007/s00158-008-0237-4, Hajela, P., and Lee, E. (1995). doi: 10.1007/BF01197454, Chou, J.-S., and Pham, A.-D. (2013). Although it takes a long time for the training, the trained agent requires very low computational cost compared with GA at the application stage. Front. To reduce the required capacity of a storage device, 1,000 sets of observed transitions (s, a, s, r) are stored at the maximum.

Topology is optimized for two boundary conditions. (1998). This algorithm is terminated if the best cost function value fb is not updated for ns = 10 consecutive generations. Background information of deep learning for structural engineering. Comput. The same material property and constraints as the examples of RL are applied to the following problems. doi: 10.1109/CVPR.2016.90, Khandelwal, M. (2011). Loading condition L2 of Example 3; (A) initial GS, (B) removal sequence of members. Gradient-based learning applied to document recognition, in Proceedings of the IEEE, 22782324. The statistical data with respect to the maximum test score for each training are as follows; the average is 43.38, the standard deviation is 0.16, and the coefficient of variation is only 3.80 103. Struct. Optim. Softw. 3, 2552. The truss is optimized for two loading conditions. (2018). *Correspondence: Kazuki Hayashi, hayashi.kazuki.55a@st.kyoto-u.ac.jp, View all In addition, the accuracy of the trained agent is less dependent on the size of the problem; the trained agent reached presumable global optimum for 10 10-grid truss with L1 loading condition, although the agent was caught at the local optimum for 8 8-grid truss with the same loading condition, and even for 3 2-grid truss with B1 boundary condition. Example 2: 3 2-grid truss (V = 0.0340 [m3]). Struct. The training method for tuning the parameters is described below. Math. The initial GS is illustrated in Figure 3. In other words, should be closer to 1 if future and instant rewards are equivalently important, and 0 if only instant reward is important. Nature 323, 533536. Nature 518, 529533. 133, 1219. doi: 10.1007/s00158-017-1710-8, Ohsaki, M., and Katoh, N. (2005). 1, NIPS'12 (Tahoe, CA: Curran Associates Inc.), 10971105. Introduction to Reinforcement Learning. Eng. A., Veness, J., Bellemare, M. G., et al. Eng. KH designed the study, implemented the program, and wrote the initial draft of the manuscript. Since ^ is also computed using {1, , 6}, the action value Q(^,i) is dependent on = {1, , 9}. Topping, B., Khan, A., and Leite, J. arXiv:1704.01212. Although stress and displacement bounds have the same value and , respectively, for each member and DOF in this study, it should be noted that each member could have a different stress bound and each DOF could have a different displacement bound for each load case, which provides a versatility to the proposed method. When a function approximator is not utilized, the action value is updated using state s, chosen action a, observed next state s and reward r as: where > 0 is a learning rate that has an effect on convergence of the training. doi: 10.1038/nature14236. Genetic algorithms in truss topological optimization. After the 18th step, the topology became asymmetric despite the symmetry of problem definition; however, the asymmetric solutions should be accepted, because Guo et al. Optimising the load path of compression-only thrust networks through independent sets. Cambridge, MA: MIT Press. Keywords: topology optimization, binary-type approach, machine learning, reinforcement learning, graph embedding, truss, stress and displacement constraints, Citation: Hayashi K and Ohsaki M (2020) Reinforcement Learning and Graph Embedding for Binary Truss Topology Optimization Under Stress and Displacement Constraints. Neural message passing for quantum chemistry. If stress and displacement constraints are satisfied, penalty terms become zero and the cost function becomes equivalent to the total structural volume V(A). MO contributed to problem formulation and interpretation of data, and assisted in the preparation of the manuscript. Arxiv:1801.05463.

Mater. J. doi: 10.1016/0045-7949(94)00617-C, Ohsaki, M., and Hayashi, K. (2017). Deep learning for topology optimization design. (1986). (Princeton, NJ: Princeton University Press). Data Eng. Figure 8. Construct. (1997). Note that the connection between the members in each pair is an unstable node, and must be fixed to generate a single long member. doi: 10.1016/S0045-7825(97)00215-6, Perozzi, B., Al-Rfou', R., and Skiena, S. (2014). Multidiscip. It is also advantageous that the agent is easily replicated and available in other computers by importing the trained parameters. saved after the 5,000-episode training is regarded as the best parameters. 60, 231244. Topological design of truss structures using simulated annealing. Architect. Symmetry properties in structural optimization: Some extensions. At the 18th step, a tower-like symmetric topology is created with extending members from upper tips to loaded nodes. 10, 155162. COURSERA: Neural Netw. Mech. The program is implemented within Python 3.7 environment. Moreover, during the removal process, there is almost no isolated member apart from load-bearing ones and existing members efficiently transmit forces to the supports. Eng. 32, 33413357. Deepwalk: online learning of social representations. GA algorithm is run for 10 times with different initial solutions that are generated randomly, and only the best result that yields a solution with the least total structural volume is provided in the GA column in Table 3. Int. doi: 10.1007/s00158-012-0877-2, Hagishita, T., and Ohsaki, M. (2009). Figure 11. GA is one of the most prevalent metaheuristic approach for binary optimization problems, which is inspired by the process of natural selection (Mitchell, 1998). Arch. The inputs are the initial GS, the bounds for stress and displacement, and the graph embedding class that contains trainable parameters initialized by the vectors with the sizes defined by nL and nf. Struct.

Figure 6. Figure 2. Figure 3. Figure 5. Learn. 56, 11571168. The agent trained in Example 1 is reused for a smaller 3 -grid truss without re-training. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). Adv. Figure 10. Dorn, W. S. (1964). Mach Learn. 72, 1528. Watkins, C. J. C. H., and Dayan, P. (1992). JP18K18898. RMSprop (Tieleman and Hinton, 2012) is adopted as the optimization method in this study. Multidiscip. Eng. No use, distribution or reproduction is permitted which does not comply with these terms. Example 3: 6 6-grid truss (V = 0.1858 [m3]). Struct. Faber, F. A., Hutchison, L., Huang, B., Gilmer, J., Schoenholz, S. S., Dahl, G. E., et al. The agent is trained using a 72-member truss with 4 4 grids. doi: 10.1007/s00366-010-0190-x, Kirsch, U. doi: 10.1007/s00366-019-00753-w. [Epub ahead of print]. Moreover, all the trained RL agents with the best parameters led to the same 12-member sub-optimal solution as Figure 5B for loading condition L1. This work was kindly supported by Grant-in-Aid for JSPS Research Fellow No.JP18J21456 and JSPS KAKENHI No. One of the loads applied at node 4 is an irregular case where pin-supports and the loaded node aligns on the same straight line. This applicability was demonstrated through both smaller-scale and larger-scale trusses and sparse sub-optimal topologies were obtained for both cases. Note that it is possible that the two load cases are identical, or applied to different nodes but in the same direction. doi: 10.1007/s10589-007-9152-7, Bellman, R. (1957). The removal sequence of members when the maximum score is recorded is illustrated in Figure 5. ArXiv:1403.6652. doi: 10.1145/2623330.2623732. Loading condition L1 of Example 3; (A) initial GS, (B) removal sequence of members. 47, 783794. Since removal of any remaining member will cause violation of the displacement constraint, there is no unnecessary member in the sub-optimal topology. Mastering the game of go without human knowledge. Guo, X., Du, Z., Cheng, G., and Ni, C. (2013). (2015). Nakamura, S., and Suzuki, T. (2018). (2017). Nodes 1 and 5 are pin-supported and nodes 22 and 24 are subjected to 1 kN load in positive x direction separately as two load cases. Adaptive Control Processes.

Prayogo, D., Cheng, M.-Y., Wu, Y.-W., and Tran, D.-H. (2019). IEEE Trans. different removal sequences can be obtained for different order of the same set of load cases in node state data vk; for example, exchanging the values at indices 2 and 4 and those at indices 3 and 5 in vk maintains the original loading condition but may lead to different action to be taken during each member removal process, because the neural network outputs different Q values due to the exchange. Although the use of CNN-based convolution method is difficult to apply to trusses as they cannot be handled as pixel-wise data, the convolution is successfully implemented for trusses by introducing graph embedding, which has been extended in this paper from the standard node-based formulation to a member(edge)-based formulation. Loading condition L2 of Example 1; (A) initial GS, (B) removal sequence of members. Generative adversarial networks. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. The whole training workflow is described in Figure 2. The left two corners 1 and 7 are pin-supported and rightward and downward unit loads are separately applied at the bottom-right corner 43, as shown in Figure 11A in the loading condition L1. Eng. Struct. Mach. Algorithm 1. As Example 3, the agent is applied to a larger-scale truss, as shown in Figure 10, without re-training. The initial cross-sectional area is 1,000 mm2, and the elastic modulus is 2.0 105 N/mm2 for all members of all examples. Imagenet classification with deep convolutional neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems - Vol. Table 3. Optim.

4, 2631. doi: 10.1007/s11831-017-9237-0, Liew, A., Avelino, R., Moosavi, V., Van Mele, T., and Block, P. (2019). KH and MO approved the final version of the manuscript, and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Using the features, the method to estimate the action value with respect to removal of the member is further formulated. An episode is defined as a sequence of member removal process from the initial GS to the terminal state violating constraints. Comput. This is an evidence that the agent is capable of detecting the load path among members, and we estimate that this capability is mainly due to graph embedding because it extracts member features considering truss connectivity. 8, 301304. Received: 22 November 2019; Accepted: 09 April 2020; Published: 30 April 2020. The parameters are tuned using a method based on 1-step Q-learning method, which is a frequently used RL method. Appl. The GS consists of 6 6 grids and the number of members is more than twice of the 4 4-grid truss. Figure 4. Because the optimization problem (Equation 3) contains constraint functions, the cost function F used in GA is defined using the penalty term as: where 1 and 2 are penalty coefficients for stress and displacement constraints; both are set to be 1000 in this study. Human-level control through deep reinforcement learning. Boundary condition B2 of Example 2; (A) initial GS, (B) removal sequence of members. Knowl. Combining machine learning models via adaptive ensemble weighting for prediction of shear capacity of reinforced-concrete deep beams. doi: 10.1016/j.conbuildmat.2013.08.078. Genetic algorithm used in this study. Ringertz, U. T. (1986). It is confirmed from this history that the agent successfully improves its policy to eliminate unnecessary members as the training proceeds. It took about 3.9 h for training through about 235,000 linear structural analyses. 27, 193200. Optim. doi: 10.1109/5.726791, Lee, S., Ha, J., Zokhirova, M., Moon, H., and Lee, J. The number of training episodes is set as 5,000. 156, 309333. In the optimization with RMSprop, 32 datasets out of the stored transitions are randomly chosen to create a minibatch and the set of trainable parameters is updated based on the mean squared error of the loss function of each dataset computed by the right-hand side of Equation (11).

In utilizing the trained agent in Example 1, nload! 13, 258266. (2017). The edge length of each grid is 1 m also for this Example 2. Minimum weight design of elastic redundant trusses under multiple static loading conditions.