{"id":1725,"date":"2024-07-29T22:12:57","date_gmt":"2024-07-29T14:12:57","guid":{"rendered":"https:\/\/www.gnn.club\/?p=1725"},"modified":"2024-10-10T14:43:41","modified_gmt":"2024-10-10T06:43:41","slug":"%e7%a5%9e%e7%bb%8f%e7%bd%91%e7%bb%9c","status":"publish","type":"post","link":"http:\/\/www.gnn.club\/?p=1725","title":{"rendered":"\u795e\u7ecf\u7f51\u7edc\uff08NN\uff09"},"content":{"rendered":"<h1><img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729221434927.png\" style=\"height:50px;display:inline\"> Deep Learning<\/h1>\n<hr \/>\n<p>create by Arwin Yu<\/p>\n<h2>Tutorial 01 -  Neural Networks<\/h2>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/bubbles\/50\/000000\/checklist.png\" style=\"height:50px;display:inline\"> Agenda<\/h3>\n<hr \/>\n<ul>\n<li>\u611f\u77e5\u673a\u6a21\u578b\uff08Perceptron\uff09<\/li>\n<li>\u591a\u5c42\u611f\u77e5\u673a\uff08Multi-Layer Preceptron\uff09<\/li>\n<li>\u524d\u5411\u8ba1\u7b97\uff08Forward calulation\uff09<\/li>\n<li>\u53cd\u5411\u4f20\u64ad\uff08Backproagation\uff09<\/li>\n<li>\u57fa\u4e8e\u795e\u7ecf\u7f51\u7edc\u7684\u623f\u4ef7\u56de\u5f52\u6a21\u578b\uff08Housing price regression\uff09<\/li>\n<li>\u6743\u91cd\u521d\u59cb\u5316\uff08Initialization of weights\uff09<\/li>\n<li>\u6df1\u5ea6\u53cc\u91cd\u4e0b\u964d(Deep Double Descent) <\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/plasticine\/100\/000000\/mind-map.png\" style=\"height:50px;display:inline\"> The Perceptron<\/h3>\n<hr \/>\n<ul>\n<li>\n<p>\u7b2c\u4e00\u4e2a\u4e5f\u662f\u6700\u7b80\u5355\u7684\u7ebf\u6027\u6a21\u578b\u4e4b\u4e00\u3002<\/p>\n<\/li>\n<li>\n<p>\u57fa\u4e8e <em>\u7ebf\u6027\u9608\u503c\u5355\u5143<\/em> (LTU)\uff1a\u8f93\u5165\u548c\u8f93\u51fa\u662f\u6570\u5b57\uff0c\u6bcf\u4e2a\u8fde\u63a5\u90fd\u4e0e\u4e00\u4e2a\u6743\u91cd\u76f8\u5173\u8054\u3002<\/p>\n<\/li>\n<li>\n<p>LTU \u8ba1\u7b97\u5176\u8f93\u5165\u7684\u52a0\u6743\u548c\uff1a$z = w_1x_1 + w_2x_2 +....+w_nx_n = w^Tx$\uff0c\u7136\u540e\u5bf9\u8be5\u548c\u5e94\u7528 <strong>\u9636\u8dc3\u51fd\u6570<\/strong> \u5e76\u8f93\u51fa\u7ed3\u679c\uff1a$$ h_w(x) = step(z) = step(w^Tx) $$<\/p>\n<\/li>\n<li>\n<p>Illustration:<\/p>\n<p align=\"center\">\n<img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729221535924.png\" style=\"height:300px\">\n<\/p>\n<\/li>\n<li>\n<p><strong>Pseudocode<\/strong>:<\/p>\n<ul>\n<li><strong>Require<\/strong>: Learning rate $\\eta$<\/li>\n<li><strong>Require<\/strong>: Initial parameter $w$<\/li>\n<li><strong>While<\/strong> stopping criterion not met <strong>do<\/strong>\n<ul>\n<li>For $i=1,...,m$:\n<ul>\n<li>$ w_{t+1} \\leftarrow w_t +\\eta(y_i -sign(w_t^Tx_i))x_i $<\/li>\n<\/ul>\n<\/li>\n<li>$t \\leftarrow t + 1$<\/li>\n<\/ul>\n<\/li>\n<li><strong>end while<\/strong><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/layers.png\" style=\"height:50px;display:inline\"> Multi-Layer Perceptron (MLP)<\/h3>\n<hr \/>\n<ul>\n<li>\n<p>MLP \u7531\u4e00\u4e2a\u8f93\u5165\u5c42\u3001\u4e00\u4e2a\u6216\u591a\u4e2a\u9690\u85cf\u5c42\u548c\u4e00\u4e2a\u6700\u7ec8\u8f93\u51fa\u5c42\u7ec4\u6210\u3002<\/p>\n<\/li>\n<li>\n<p>\u5f53\u9690\u85cf\u5c42\u7684\u6570\u91cf\u5927\u4e8e 2 \u65f6\uff0c\u7f51\u7edc\u901a\u5e38\u79f0\u4e3a\u6df1\u5ea6\u795e\u7ecf\u7f51\u7edc (DNN)\uff0c\u5c0f\u4e8e2\u6210\u4e3aMLP\uff08\u4e00\u822c\u60c5\u51b5\u4e0b\u7684\u4e00\u79cd\u4e60\u60ef\uff0c\u4e0d\u662f\u5b9a\u4e49\uff09\u3002<\/p>\n<\/li>\n<\/ul>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729221633250.png\" style=\"height:300px\">\n<\/p>\n<p><strong>\u5c42\u7ea7\u7ed3\u6784\u6709\u4ec0\u4e48\u7528\uff1f<\/strong><\/p>\n<p>\u7b54\u6848\u5f88\u7b80\u5355\uff0c\u8fd9\u662f\u7b97\u6cd5\u5206\u6790\u6570\u636e\u7684\u65b9\u5f0f\u3002\u5148\u7c7b\u6bd4\u4e00\u4e2a\u751f\u6d3b\u4e2d\u7684\u4f8b\u5b50\u4ee5\u4fbf\u7406\u89e3\uff1a\u5f53\u6211\u4eec\u770b\u5230\u4e00\u5f20\u56fe\u7247\u65f6\uff0c\u662f\u5426\u53ef\u4ee5\u77ac\u95f4\u5c31\u83b7\u5f97\u5176\u4e2d\u7684\u4fe1\u606f\uff1f\u5176\u5b9e\u4e0d\u662f\uff0c\u6211\u4eec\u9700\u8981\u4e00\u5b9a\u7684\u601d\u8003\u65f6\u95f4\uff0c\u4ece\u591a\u4e2a\u89d2\u5ea6\u53bb\u5206\u6790\u7406\u89e3\u56fe\u7247\u6570\u636e\u4e2d\u8868\u8fbe\u7684\u4fe1\u606f\uff1b\u8fd9\u5c31\u5982\u540c\u795e\u7ecf\u7f51\u7edc\u4e2d\u7684\u591a\u4e2a\u5c42\u7ea7\u7ed3\u6784\u4e00\u6837\uff0c\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u5c31\u662f\u4f9d\u9760\u8fd9\u4e9b\u5c42\u7ea7\u7ed3\u6784\u4ece\u4e0d\u540c\u89d2\u5ea6\u4e0b\u63d0\u53d6\u539f\u59cb\u6570\u636e\u4fe1\u606f\u7684\u3002<\/p>\n<p>\u4ece\u6570\u5b66\u89d2\u5ea6\u6765\u8bb2\uff0c\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u6bcf\u5c42\u7684\u611f\u77e5\u673a\u6570\u91cf\u90fd\u4e0d\u540c\uff0c\u8fd9\u76f8\u5f53\u4e8e\u5bf9\u539f\u59cb\u6570\u636e\u8fdb\u884c\u5347\u3001\u964d\u7ef4\uff0c\u5728\u4e0d\u540c\u7684\u7ef4\u5ea6\u7a7a\u95f4\u4e0b\u63d0\u53d6\u539f\u59cb\u6570\u636e\u7684\u7279\u5f81\u3002\u4e0d\u540c\u7ef4\u5ea6\u7a7a\u95f4\u53c8\u662f\u4ec0\u4e48\u610f\u601d\uff1f\u4e3e\u4e2a\u4f8b\u5b50\uff0c\u73b0\u5728\u4f7f\u7528\u4e00\u4e2a\u7b80\u5355\u7684\u7ebf\u6027\u5206\u7c7b\u5668\uff0c\u8bd5\u56fe\u5b8c\u7f8e\u5730\u5bf9\u732b\u548c\u72d7\u8fdb\u884c\u5206\u7c7b\u3002\u9996\u5148\u53ef\u4ee5\u4ece\u4e00\u4e2a\u7279\u5f81\u5f00\u59cb\uff0c\u5982\u201c\u5706\u773c\u201d\u7279\u5f81\uff0c\u5206\u7c7b\u7ed3\u679c\u5982\u4e0b\u56fe\u6240\u793a\u3002<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729221722727.png\" style=\"height:200px\">\n<\/p>\n<p>\u7531\u4e8e\u732b\u548c\u72d7\u90fd\u662f\u5706\u773c\u775b\uff0c\u6b64\u65f6\u65e0\u6cd5\u83b7\u5f97\u5b8c\u7f8e\u7684\u5206\u7c7b\u7ed3\u679c\u3002\u56e0\u6b64\uff0c\u53ef\u80fd\u4f1a\u51b3\u5b9a\u589e\u52a0\u5176\u5b83\u7279\u5f81\uff0c\u5982\u201c\u5c16\u8033\u6735\u201d\u7279\u5f81\uff0c\u5206\u7c7b\u7ed3\u679c\u5982\u4e0b\u56fe\u3002<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729221812199.png\" style=\"height:300px\">\n<\/p>\n<p>\u6b64\u65f6\u53d1\u73b0\uff0c\u732b\u548c\u72d7\u4e24\u4e2a\u7c7b\u578b\u7684\u6570\u636e\u5206\u5e03\u6e10\u6e10\u79bb\u6563\uff0c\u6700\u540e\uff0c\u589e\u52a0\u7b2c\u4e09\u4e2a\u7279\u5f81\uff0c\u4f8b\u5982\u201c\u957f\u9f3b\u5b50\u201d\u7279\u5f81\uff0c\u5f97\u5230\u4e00\u4e2a\u4e09\u7ef4\u7279\u5f81\u7a7a\u95f4\uff0c\u5982\u4e0b\u56fe\u3002<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729221851138.png\" style=\"height:300px\">\n<\/p>\n<p>\u6b64\u65f6\uff0c\u6a21\u578b\u5df2\u7ecf\u53ef\u4ee5\u5f88\u597d\u5730\u62df\u5408\u51fa\u4e00\u4e2a\u5206\u7c7b\u51b3\u7b56\u9762\u5bf9\u732b\u548c\u72d7\u4e24\u4e2a\u7c7b\u578b\u8fdb\u884c\u5206\u7c7b\u4e86\u3002\u90a3\u4e48\u5f88\u81ea\u7136\u5730\u8054\u60f3\u4e00\u4e0b\uff1a\u5982\u679c\u7ee7\u7eed\u589e\u52a0\u7279\u5f81\u6570\u91cf\uff0c\u5c06\u539f\u59cb\u6570\u636e\u6620\u5c04\u5230\u66f4\u9ad8\u7ef4\u5ea6\u7684\u7a7a\u95f4\u4e0b\u662f\u4e0d\u662f\u66f4\u6709\u5229\u4e8e\u5206\u7c7b\u5462\uff1f<\/p>\n<p>\u4e8b\u5b9e\u5e76\u975e\u5982\u6b64\u3002\u6ce8\u610f\u5f53\u589e\u52a0\u95ee\u9898\u7ef4\u6570\u7684\u65f6\u5019, \u8bad\u7ec3\u6837\u672c\u7684\u5bc6\u5ea6\u662f\u5448\u6307\u6570\u4e0b\u964d\u7684\u3002\u5047\u8bbe 10 \u4e2a\u8bad\u7ec3\u5b9e\u4f8b\u6db5\u76d6\u4e86\u5b8c\u6574\u7684\u4e00\u7ef4\u7279\u5f81\u7a7a\u95f4\uff0c\u5176\u5bbd\u5ea6\u4e3a 5 \u4e2a\u5355\u5143\u95f4\u9694\u3002\u56e0\u6b64\uff0c\u5728\u4e00\u7ef4\u60c5\u51b5\u4e0b\uff0c\u6837\u672c\u5bc6\u5ea6\u4e3a 10\/5=2 (\u6837\u672c\/\u95f4\u9694)\u3002<\/p>\n<p>\u5728\u4e8c\u7ef4\u60c5\u51b5\u4e0b\uff0c\u4ecd\u7136\u6709 10 \u4e2a\u8bad\u7ec3\u5b9e\u4f8b\uff0c\u73b0\u5728\u5b83\u7528 5\u00d75=25 \u4e2a\u5355\u4f4d\u6b63\u65b9\u5f62\u9762\u79ef\u6db5\u76d6\u4e86\u4e8c\u7ef4\u7684\u7279\u5f81\u7a7a\u95f4\u3002\u56e0\u6b64\uff0c\u5728\u4e8c\u7ef4\u60c5\u51b5\u4e0b\uff0c\u6837\u672c\u5bc6\u5ea6\u4e3a 10\/25=0.4 (\u6837\u672c\/\u95f4\u9694)\u3002<\/p>\n<p>\u6700\u540e, \u5728\u4e09\u7ef4\u7684\u60c5\u51b5\u4e0b, 10 \u4e2a\u6837\u672c\u8986\u76d6\u4e86 5\u00d75\u00d75=125 \u4e2a\u5355\u4f4d\u7acb\u65b9\u4f53\u7279\u5f81\u7a7a\u95f4\u4f53\u79ef\u3002\u56e0\u6b64\uff0c\u5728\u4e09\u7ef4\u7684\u60c5\u51b5\u4e0b\uff0c\u6837\u672c\u5bc6\u5ea6\u4e3a 10\/125=0.08 (\u6837\u672c\/\u95f4\u9694)\u3002<\/p>\n<p>\u5982\u679c\u4e0d\u65ad\u589e\u52a0\u7279\u5f81\uff0c\u5219\u7279\u5f81\u7a7a\u95f4\u7684\u7ef4\u6570\u4e5f\u5728\u589e\u957f\uff0c\u5e76\u53d8\u5f97\u8d8a\u6765\u8d8a\u7a00\u758f\u3002\u7531\u4e8e\u8fd9\u79cd\u7a00\u758f\u6027\uff0c\u627e\u5230\u4e00\u4e2a\u53ef\u5206\u79bb\u7684\u8d85\u5e73\u9762\u4f1a\u53d8\u5f97\u975e\u5e38\u5bb9\u6613\u3002\u5982\u679c\u5c06\u9ad8\u7ef4\u7684\u5206\u7c7b\u7ed3\u679c\u6620\u5c04\u5230\u4f4e\u7ef4\u7a7a\u95f4\uff0c\u4e0e\u6b64\u65b9\u6cd5\u76f8\u5173\u8054\u7684\u4e25\u91cd\u95ee\u9898\u5c31\u51f8\u663e\u51fa\u6765\u3002\u732b\u548c\u72d7\u5728\u9ad8\u7eac\u5ea6\u7279\u5f81\u7a7a\u95f4\u4e0b\u7684\u5206\u7c7b\u7ed3\u679c\u5982\u4e0b\u56fe\u6240\u793a\u3002\u6ce8\u610f\uff0c\u56e0\u4e3a\u9ad8\u7ef4\u7279\u5f81\u7a7a\u95f4\u96be\u4ee5\u5728\u7eb8\u5f20\u4e0a\u8868\u793a\uff0c\u4e0b\u56fe\u662f\u5c06\u9ad8\u7ef4\u7a7a\u95f4\u7684\u5206\u7c7b\u7ed3\u679c\u6620\u5c04\u5230\u4e8c\u7ef4\u7a7a\u95f4\u4e0b\u7684\u5c55\u793a\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u6a21\u578b\u8bad\u7ec3\u7684\u5206\u7c7b\u51b3\u7b56\u9762\u53ef\u4ee5\u975e\u5e38\u8f7b\u6613\u4e14\u5b8c\u7f8e\u5730\u533a\u5206\u6240\u6709\u4e2a\u4f53\u3002<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729221930485.png\" style=\"height:300px\">\n<\/p>\n<p>\u95ee\u9898\uff1a\u5bf9\u4e8e\u8bad\u7ec3\u6570\u636e\u505a\u5b8c\u7f8e\u7684\u533a\u5206\uff0c\u8fd9\u5c82\u4e0d\u662f\u5f88\u597d\u5417\uff1f<\/p>\n<p>\u5176\u5b9e\u4e0d\u7136\uff0c\u56e0\u4e3a\u8bad\u7ec3\u6570\u636e\u662f\u53d6\u81ea\u771f\u5b9e\u4e16\u754c\u7684\uff0c\u4e14\u4efb\u4f55\u4e00\u4e2a\u8bad\u7ec3\u96c6\u90fd\u4e0d\u53ef\u80fd\u5305\u542b\u5927\u5343\u4e16\u754c\u4e2d\u7684\u5168\u90e8\u60c5\u51b5\u3002\u5c31\u597d\u6bd4\u91c7\u96c6\u732b\u72d7\u6570\u636e\u96c6\u65f6\u4e0d\u53ef\u80fd\u62cd\u6444\u5230\u5168\u4e16\u754c\u7684\u6240\u6709\u732b\u72d7\u4e00\u6837\u3002\u6b64\u65f6\u5bf9\u4e8e\u8fd9\u4e2a\u8bad\u7ec3\u6570\u636e\u96c6\u505a\u5b8c\u7f8e\u7684\u533a\u5206\u5b9e\u9645\u4e0a\u4f1a\u56fa\u5316\u6a21\u578b\u7684\u601d\u7ef4\uff0c\u4f7f\u5176\u5728\u771f\u5b9e\u4e16\u754c\u4e2d\u7684\u6cdb\u5316\u80fd\u529b\u5f88\u5dee\u3002\u8fd9\u4e2a\u73b0\u8c61\u5728\u751f\u6d3b\u4e2d\u5176\u5b9e\u5c31\u662f\u201c\u94bb\u725b\u89d2\u5c16\u201d\u3002\u4e3e\u4e2a\u4f8b\u5b50\uff1a\u5047\u8bbe\u6211\u4eec\u8d39\u5c3d\u5fc3\u601d\u60f3\u51fa\u4e86\u4e00\u767e\u79cd\u7279\u5f81\u6765\u5b9a\u4e49\u4e2d\u56fd\u7684\u725b\uff0c\u8fd9\u79cd\u4e25\u683c\u7684\u5b9a\u4e49\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u5c06\u725b\u4e0e\u5176\u4ed6\u7269\u79cd\u533a\u5206\u5f00\u6765\u3002\u4f46\u662f\u6709\u4e00\u5929\uff0c\u4e00\u53ea\u82f1\u56fd\u7684\u5976\u725b\u6f02\u6d0b\u8fc7\u6d77\u6e38\u5230\u4e86\u4e2d\u56fd\u3002\u7531\u4e8e\u8fd9\u53ea\u5916\u56fd\u725b\u53ea\u670990\u79cd\u7279\u5f81\u7b26\u5408\u4e2d\u56fd\u5bf9\u725b\u7684\u5b9a\u4e49\uff0c\u5c31\u4e0d\u628a\u5b83\u5b9a\u4e49\u4e3a\u725b\u4e86\u3002\u8fd9\u79cd\u505a\u6cd5\u663e\u7136\u662f\u4e0d\u5408\u7406\u7684\uff0c\u539f\u56e0\u662f\u7279\u5f81\u7a7a\u95f4\u7684\u7ef4\u5ea6\u592a\u9ad8\uff0c\u628a\u8fd9\u79cd\u73b0\u8c61\u79f0\u4e3a\u201c\u7ef4\u5ea6\u8bc5\u5492\u201d\uff0c\u5f53\u95ee\u9898\u7684\u7ef4\u6570\u53d8\u5f97\u6bd4\u8f83\u5927\u65f6\uff0c\u5206\u7c7b\u5668\u7684\u6027\u80fd\u964d\u4f4e\u3002<\/p>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/lego-head.png\" style=\"height:50px;display:inline\"> Forward calculation<\/h2>\n<hr \/>\n<ul>\n<li>\n<p>\u5728 <em>\u524d\u5411\u4f20\u9012<\/em> \u4e2d\uff0c\u5bf9\u4e8e\u6bcf\u4e2a\u8bad\u7ec3\u5b9e\u4f8b\uff0c\u7b97\u6cd5\u5c06\u5176\u9988\u9001\u5230\u7f51\u7edc\u5e76\u8ba1\u7b97\u6bcf\u4e2a\u8fde\u7eed\u5c42\u4e2d\u6bcf\u4e2a\u795e\u7ecf\u5143\u7684\u8f93\u51fa<\/p>\n<\/li>\n<li>\n<p>\u4f7f\u7528\u7f51\u7edc\u8fdb\u884c\u9884\u6d4b\u53ea\u662f\u8fdb\u884c\u524d\u5411\u4f20\u9012\u3002<\/p>\n<\/li>\n<\/ul>\n<p>\u793a\u4f8b\u5982\u4e0b\uff1a<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729222116112.gif\" style=\"height:400px\">\n<\/p>\n<p><a href=\"https:\/\/medium.com\/the-feynman-journal\/the-linear-and-nonlinear-nature-of-feedforward-84199eb3edea\">Image Source<\/a><\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/plasticine\/100\/000000\/serial-tasks.png\" style=\"height:50px;display:inline\"> Backpropagation<\/h3>\n<hr \/>\n<p>\u53cd\u5411\u4f20\u64ad\u662f\u4e00\u79cd\u6709\u6548\u7684\u8ba1\u7b97\u68af\u5ea6\u7684\u65b9\u6cd5\uff0c\u5b83\u53ef\u4ee5\u5feb\u901f\u8ba1\u7b97\u7f51\u7edc\u4e2d\u6bcf\u4e2a\u795e\u7ecf\u5143\u7684\u504f\u5bfc\u6570\u3002\u53cd\u5411\u4f20\u64ad\u901a\u8fc7\u5148\u6b63\u5411\u4f20\u64ad\u8ba1\u7b97\u7f51\u7edc\u7684\u8f93\u51fa\uff0c\u7136\u540e\u4ece\u8f93\u51fa\u5c42\u5230\u8f93\u5165\u5c42\u53cd\u5411\u4f20\u64ad\u8bef\u5dee\uff0c\u6700\u540e\u6839\u636e\u8bef\u5dee\u8ba1\u7b97\u6bcf\u4e2a\u795e\u7ecf\u5143\u7684\u504f\u5bfc\u6570\u3002\u53cd\u5411\u4f20\u64ad\u7b97\u6cd5\u7684\u6838\u5fc3\u601d\u60f3\u662f\u901a\u8fc7\u94fe\u5f0f\u6cd5\u5219\u5c06\u8bef\u5dee\u5411\u540e\u4f20\u9012\uff0c\u8ba1\u7b97\u6bcf\u4e2a\u795e\u7ecf\u5143\u5bf9\u8bef\u5dee\u7684\u8d21\u732e\u3002<\/p>\n<p>\u793a\u4f8b\u5982\u4e0b\uff1a<\/p>\n<p>\u521d\u59cb\u5316\u7f51\u7edc\uff0c\u6784\u5efa\u4e00\u4e2a\u53ea\u6709\u4e00\u5c42\u7684\u795e\u7ecf\u7f51\u7edc<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729222340577.png\" style=\"height:200px\">\n<\/p>\n<p>\uff081\uff09\u521d\u59cb\u5316\u7f51\u7edc\u53c2\u6570\uff1a<\/p>\n<p>\u5047\u8bbe\u795e\u7ecf\u7f51\u7edc\u7684\u8f93\u5165\u548c\u8f93\u51fa\u7684\u521d\u59cb\u5316\u4e3a: $x_1=0.5, x_2=1.0, y=0.8$ \u3002<\/p>\n<p>\u53c2\u6570\u7684\u521d\u59cb\u5316\u4e3a: $w_1=1.0, w_2=0.5, w_3=0.5, w_4=0.7, w_5=1.0, w_6=2.0$ \u3002<\/p>\n<p>\uff082\uff09\u524d\u5411\u8ba1\u7b97, \u5982\u4e0b\u56fe<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729222426797.png\" style=\"height:200px\">\n<\/p>\n<p>\u540c\u7406, \u8ba1\u7b97 $h_2$ \u7b49\u4e8e 0.95 \u3002\u5c06 $h_1$ \u548c $h_2$ \u76f8\u4e58\u6c42\u548c\u5230\u524d\u5411\u4f20\u64ad\u7684\u8ba1\u7b97\u7ed3\u679c, \u5982\u4e0b\u56fe<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729222505497.png\" style=\"height:200px\">\n<\/p>\n<p>$$<br \/>\n\\begin{aligned}<br \/>\ny^{\\prime} &amp; =w_5 \\cdot h_1^{(1)}+w_6 \\cdot h_2^{(1)} \\\\<br \/>\n&amp; =1.0 \\cdot 1.0+2.0 \\cdot 0.95 \\\\<br \/>\n&amp; =2.9<br \/>\n\\end{aligned}<br \/>\n$$<\/p>\n<p>\uff083\uff09\u8ba1\u7b97\u635f\u5931: \u6839\u636e\u6570\u636e\u771f\u5b9e\u503c $y=0.8$ \u548c\u5e73\u65b9\u5dee\u635f\u5931\u51fd\u6570\u6765\u8ba1\u7b97\u635f\u5931<\/p>\n<p>$$<br \/>\n\\begin{aligned}<br \/>\n\\delta &amp; =\\frac{1}{2}\\left(y-y^{\\prime}\\right)^2 \\\\<br \/>\n&amp; =0.5(0.8-2.9)^2 \\\\<br \/>\n&amp; =2.205<br \/>\n\\end{aligned}<br \/>\n$$<\/p>\n<p>\uff084\uff09\u8ba1\u7b97\u68af\u5ea6: \u6b64\u8fc7\u7a0b\u5b9e\u9645\u4e0a\u5c31\u662f\u8ba1\u7b97\u504f\u5fae\u5206\u7684\u8fc7\u7a0b, \u4ee5\u53c2\u6570 $w_5$ \u7684\u504f\u5fae\u5206\u8ba1\u7b97\u4e3a\u4f8b, \u5982\u4e0b\u56fe<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729222554509.png\" style=\"height:200px\">\n<\/p>\n<p>\u6839\u636e\u94fe\u5f0f\u6cd5\u5219:<br \/>\n$$<br \/>\n\\frac{\\partial \\delta}{\\partial w_5}=\\frac{\\partial \\delta}{\\partial y^{\\prime}} \\cdot \\frac{\\partial y^{\\prime}}{\\partial w_5}<br \/>\n$$<\/p>\n<p>\u5176\u4e2d:<br \/>\n$$<br \/>\n\\begin{aligned}<br \/>\n\\frac{\\partial \\delta}{\\partial y^{\\prime}} &amp; =2 \\cdot \\frac{1}{2} \\cdot\\left(y-y^{\\prime}\\right)(-1) \\\\<br \/>\n&amp; =y^{\\prime}-y \\\\<br \/>\n&amp; =2.9-0.8 \\\\<br \/>\n&amp; =2.1 \\\\<br \/>\ny^{\\prime} &amp; =w_5 \\cdot h_1^{(1)}+w_6 \\cdot h_2^{(1)} \\\\<br \/>\n\\frac{\\partial y^{\\prime}}{\\partial w_5} &amp; =h_1^{(1)}+0 \\\\<br \/>\n&amp; =1.0<br \/>\n\\end{aligned}<br \/>\n$$<\/p>\n<p>\u6240\u4ee5:<br \/>\n$$<br \/>\n\\frac{\\partial \\delta}{\\partial w_5}=\\frac{\\partial \\delta}{\\partial y^{\\prime}} \\cdot \\frac{\\partial y^{\\prime}}{\\partial w_5}=2.1 \\times 1.0=2.1<br \/>\n$$<\/p>\n<p>\u7c7b\u4f3c\u7684\uff0c\u5982\u679c\u4ee5\u53c2\u6570 $w_1$ \u4e3a\u4f8b\u5b50, \u5b83\u7684\u504f\u5fae\u5206\u8ba1\u7b97\u5c31\u4e5f\u7528\u5230\u94fe\u5f0f\u6cd5\u5219, \u8fc7\u7a0b\u5982\u4e0b\u6240\u793a\u3002<\/p>\n<p>$$<br \/>\n\\begin{gathered}<br \/>\n\\frac{\\partial \\delta}{\\partial w_1}=\\frac{\\partial \\delta}{\\partial y^{\\prime}} \\cdot \\frac{\\partial y^{\\prime}}{\\partial h_1^{(1)}} \\cdot \\frac{\\partial h_1^{(1)}}{\\partial w_1} \\\\<br \/>\ny^{\\prime}=w_5 \\cdot h_1^{(1)}+w_6 \\cdot h_2^{(1)} \\\\<br \/>\n\\frac{\\partial y^{\\prime}}{\\partial h_1^{(1)}}=w_5+0 \\\\<br \/>\n=1.0 \\\\<br \/>\nh_1^{(1)}=w_1 \\cdot x_1+w_2 \\cdot x_2 \\\\<br \/>\n\\frac{\\partial h_1^{(1)}}{\\partial w_1}=x_1+0 \\\\<br \/>\n\\frac{\\partial \\delta}{\\partial w_1}=\\frac{\\partial \\delta}{\\partial y^{\\prime}} \\cdot \\frac{\\partial y^{\\prime}}{\\partial h_1^{(1)}} \\cdot \\frac{\\partial h_1^{(1)}}{\\partial w_1}=2.1 \\times 1.0 \\times 0.5=1.05<br \/>\n\\end{gathered}<br \/>\n$$<\/p>\n<p>\uff085\uff09\u68af\u5ea6\u4e0b\u964d\u66f4\u65b0\u7f51\u7edc\u53c2\u6570\uff1a\u5047\u8bbe\u8fd9\u91cc\u7684\u8d85\u53c2\u6570 \u201c\u5b66\u4e60\u901f\u7387\u201d \u7684\u521d\u59cb\u503c\u4e3a 0.1 , \u6839\u636e\u68af\u5ea6\u4e0b\u964d\u7684\u66f4\u65b0\u516c\u5f0f, $w_1$ \u53c2\u6570\u7684\u66f4\u65b0\u8ba1\u7b97\u5982\u4e0b\u6240\u793a:<br \/>\n$$<br \/>\nw_1^{\\text {(update) }}=w_1-\\eta \\cdot \\frac{\\partial \\delta}{\\partial w_1}=1.0-0.1 \\times 1.05=0.895<br \/>\n$$<\/p>\n<p>\u540c\u7406, \u53ef\u4ee5\u8ba1\u7b97\u5f97\u5230\u5176\u4ed6\u7684\u66f4\u65b0\u540e\u7684\u53c2\u6570:<br \/>\n$$<br \/>\nw_1=0.895, w_2=0.895, w_3=0.29, w_4=0.28, w_5=0.79, w_6=1.8005<br \/>\n$$<\/p>\n<p>\u5230\u6b64\u4e3a\u6b62, \u6211\u4eec\u5c31\u5b8c\u6210\u4e86\u53c2\u6570\u8fed\u4ee3\u7684\u5168\u90e8\u8fc7\u7a0b\u3002\u53ef\u4ee5\u8ba1\u7b97\u4e00\u4e0b\u635f\u5931\u770b\u770b\u662f\u5426\u6709\u51cf\u5c0f, \u8ba1\u7b97\u5982\u4e0b:<br \/>\n$$<br \/>\n\\begin{aligned}<br \/>\n\\delta &amp; =\\frac{1}{2}\\left(y-y^{\\prime}\\right)^2 \\\\<br \/>\n&amp; =0.5(0.8-1.3478)^2 \\\\<br \/>\n&amp; =0.15<br \/>\n\\end{aligned}<br \/>\n$$<\/p>\n<p>\u6b64\u7ed3\u679c\u76f8\u6bd4\u8f83\u4e8e\u4e4b\u95f4\u8ba1\u7b97\u7684\u524d\u5411\u4f20\u64ad\u7684\u7ed3\u679c 2.205 , \u662f\u6709\u660e\u663e\u7684\u51cf\u5c0f\u7684\u3002<\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/popular-topic.png\" style=\"height:50px;display:inline\"> \u5e38\u7528\u5c42<\/h3>\n<hr \/>\n<ul>\n<li>\u7ebf\u6027\u5c42\uff08\u8f93\u5165\u7684\u7ebf\u6027\u7ec4\u5408\uff09\u3002<\/li>\n<li>\u6fc0\u6d3b\u5c42\uff08\u901a\u5e38\u4e0e\u7ebf\u6027\u5c42\u4e00\u8d77\u4f7f\u7528\uff0c\u5bf9\u52a0\u6743\u8f93\u5165\u7684\u7ebf\u6027\u7ec4\u5408\u5e94\u7528\u51fd\u6570\uff09\uff1aReLU\u3001Binary Step\u3001Sigmoid\u3001TanH \u7b49...<\/li>\n<li>Softmax \u5c42\uff08\u8d85\u8fc7 2 \u4e2a\u7c7b\u7684 Sigmoid\uff0c\u8f93\u51fa\u6bcf\u4e2a\u7c7b\u7684\u6982\u7387\uff09\u7528\u4e8e\u5206\u7c7b\u4efb\u52a1\u3002<\/li>\n<li>\u635f\u5931\u51fd\u6570\u5c42\uff08\u4f8b\u5982 MSE \u548c\u4ea4\u53c9\u71b5\uff09\u3002<\/li>\n<\/ul>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/home.png\" style=\"height:50px;display:inline\"> \u793a\u4f8b - \u56de\u5f52\u795e\u7ecf\u7f51\u7edc - \u623f\u4ef7<\/h2>\n<hr \/>\n<ul>\n<li>\u623f\u4ef7\u6570\u636e\u96c6\uff1a<\/li>\n<li>\u4e24\u4e2a\u8f93\u5165\u7279\u5f81\uff1a<em>Size<\/em> \u548c <em>Floor<\/em><\/li>\n<li>\u4e00\u4e2a\u8f93\u51fa\uff1a<em>\u623f\u4ef7<\/em><\/li>\n<li><strong>\u635f\u5931\u51fd\u6570<\/strong>\uff1aMSE<\/li>\n<li><strong>\u7f51\u7edc\u67b6\u6784<\/strong>\uff1a2 \u4e2a\u9690\u85cf\u5c42\uff0c\u4e00\u4e2a\u8f93\u51fa\u5c42<\/li>\n<\/ul>\n<p>Layout: <\/p>\n<p align=\"center\">\n<p><img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729222811653.png\" style=\"height:400px\"><\/p>\n<p>$$ F(X,W) = W_3^T \\phi_2(W_2^T\\phi_1(W_1^TX + b_1) + b_2) + b_3 $$<\/p>\n<p>Where: $$ X \\in \\mathbb{R}^2 $$ $$ W_1 \\in \\mathbb{R}^{2 \\times 4} $$ $$ W_2 \\in \\mathbb{R}^{4 \\times 3} $$ $$ W_3 \\in \\mathbb{R}^{3 \\times 1} $$ $$ b_1 \\in \\mathbb{R}^4 $$ $$ b_2 \\in \\mathbb{R}^3 $$ $$ b_3 \\in \\mathbb{R} $$<\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/office\/80\/000000\/baby-footprints-path.png\" style=\"height:50px;display:inline\"> \u5206\u6b65\u89e3\u51b3\u65b9\u6848<\/h3>\n<hr \/>\n<ul>\n<li>\n<p>\u6240\u6709\u8bad\u7ec3\u793a\u4f8b  $x_i$  \u7684 MSE \u635f\u5931\u51fd\u6570\u548c\u76f8\u5e94\u7684\u8bad\u7ec3\u76ee\u6807\uff1a $$ Error = \\frac{1}{N} \\sum_{i=1}^N (F(x_i, W) - y_i)^2 = \\frac{1}{N} ||F(X, W) - Y||_2^2 $$<\/p>\n<\/li>\n<li>\n<p>\u7ebf\u6027\u5c42\uff1a $$ u_{out} = W^Tu_{in} + b $$<\/p>\n<\/li>\n<li>\n<p>\u6fc0\u6d3b\u5c42\uff1a<\/p>\n<\/li>\n<li>\n<p>$\\phi_1$ \u548c $\\phi_2$ \u662f\u591a\u5143\u5411\u91cf <em>\u975e\u7ebf\u6027<\/em> \u51fd\u6570\uff0c\u56e0\u6b64\uff1a $$ \\phi(U) = \\phi\\left(\\begin{bmatrix} u_1 \\\\ \\vdots \\\\ u_n \\end{bmatrix}\\right) = \\begin{bmatrix} \\phi(u_1) \\\\ \\vdots \\\\ \\phi(u_n) \\end{bmatrix} $$<\/p>\n<\/li>\n<li>\n<p>\u5bf9\u4e8e <strong>ReLU<\/strong>\uff1a $$ \\begin{bmatrix} \\phi(u_1) \\\\ \\vdots \\\\ \\phi(u_n) \\end{bmatrix} = \\begin{bmatrix} \\max(0, u_1) \\\\ \\vdots \\\\ \\max(0, u_n) \\end{bmatrix} $$<\/p>\n<\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/fast-forward.png\" style=\"height:50px;display:inline\"> Forward Pass<\/h3>\n<hr \/>\n<p>$$ F(X,W) = W_3^T \\phi_2(W_2^T\\phi_1(W_1^TX + b_1) + b_2) + b_3 $$<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729225110480.png\" style=\"height:500px\">\n<\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/rewind.png\" style=\"height:50px;display:inline\"> Backward Pass<\/h3>\n<hr \/>\n<p>The following illustration depicts the backpropagation process:<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729225214237.png\" style=\"height:500px\">\n<\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/cotton\/64\/000000\/olympic-torch.png\" style=\"height:50px;display:inline\"> \u4f7f\u7528 PyTorch \u6784\u5efa\u795e\u7ecf\u7f51\u7edc<\/h3>\n<hr \/>\n<p>\u73b0\u5728\u6211\u4eec\u5c06\u4f7f\u7528 PyTorch \u5b9e\u73b0\u4e00\u4e2a\u7528\u4e8e\u56de\u5f52\u7684\u795e\u7ecf\u7f51\u7edc\u3002\u6211\u4eec\u5c06\u4f7f\u7528\u201c\u6ce2\u58eb\u987f\u623f\u4ef7\u201d\u6570\u636e\u96c6\u5e76\u4f7f\u7528\u4e0a\u9762\u63cf\u8ff0\u7684\u67b6\u6784\u3002<\/p>\n<pre><code class=\"language-python\">import torch\nimport torch.nn as nn\n# define our neural network model\n# this approach provides easier access to weights (e.g., &#039;model.fc1&#039; will return the parameters of the first layer)\nclass HousePricesMLP(nn.Module):\n    # notice that we inherit from nn.Module\n    def __init__(self, input_dim, output_dim):\n        super(HousePricesMLP, self).__init__()\n        # here we initialize the building blocks of our network\n        # single neuron is just one linear (fully-connected) layer\n        self.fc_1 = nn.Linear(input_dim, 4) \n        self.fc_2 = nn.Linear(4, 3)\n        self.output_layer = nn.Linear(3, output_dim)\n\n    def forward(self, x):\n        # here we define what happens to the input x in the forward pass\n        # that is, the order in which x goes through the building blocks\n        x = torch.relu(self.fc_1(x))\n        x = torch.relu(self.fc_2(x))\n        return self.output_layer(x)<\/code><\/pre>\n<pre><code class=\"language-python\"># alternative method - more readdable, easier to code, less convenient access to weights\n# e.g., to access the first layer weights -- `model.hidden[0]`\nclass HousePricesMLP(nn.Module):\n    # notice that we inherit from nn.Module\n    def __init__(self, input_dim, output_dim):\n        super(HousePricesMLP, self).__init__()\n        # here we initialize the building blocks of our network\n        # single neuron is just one linear (fully-connected) layer\n        self.hidden = nn.Sequential(nn.Linear(input_dim, 4),\n                                    nn.ReLU(),\n                                    nn.Linear(4, 3),\n                                    nn.ReLU())\n        self.output_layer = nn.Linear(3, output_dim)\n\n    def forward(self, x):\n        # here we define what happens to the input x in the forward pass\n        # that is, the order in which x goes through the building blocks\n        return self.output_layer(self.hidden(x))<\/code><\/pre>\n<pre><code class=\"language-python\"># NOTE: in this example we are using a very simple NN model\n# We usually wider and deeper networks such as this one:\nclass HousePricesMLP(nn.Module):\n    # notice that we inherit from nn.Module\n    def __init__(self, input_dim, output_dim, hidden_dim=256):\n        super(HousePricesMLP, self).__init__()\n        # here we initialize the building blocks of our network\n        # single neuron is just one linear (fully-connected) layer\n        self.hidden = nn.Sequential(nn.Linear(input_dim, hidden_dim),\n                                    nn.ReLU(),\n                                    nn.Linear(hidden_dim, hidden_dim),\n                                    nn.ReLU(),\n                                    nn.Linear(hidden_dim, hidden_dim),\n                                    nn.ReLU(),)\n        self.output_layer = nn.Linear(hidden_dim, output_dim)\n\n    def forward(self, x):\n        # here we define what happens to the input x in the forward pass\n        # that is, the order in which x goes through the building blocks\n        return self.output_layer(self.hidden(x))<\/code><\/pre>\n<pre><code class=\"language-python\">from sklearn.datasets import fetch_california_housing\nimport pandas as pd\n\n# Load data\ncalifornia_housing = fetch_california_housing()\n\n# Convert to DataFrame\ndata = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)\ndata[&#039;target&#039;] = california_housing.target\n\n# Print description of the features\nprint(california_housing.DESCR)\n<\/code><\/pre>\n<pre><code>.. _california_housing_dataset:\n\nCalifornia Housing dataset\n--------------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 20640\n\n    :Number of Attributes: 8 numeric, predictive attributes and the target\n\n    :Attribute Information:\n        - MedInc        median income in block group\n        - HouseAge      median house age in block group\n        - AveRooms      average number of rooms per household\n        - AveBedrms     average number of bedrooms per household\n        - Population    block group population\n        - AveOccup      average number of household members\n        - Latitude      block group latitude\n        - Longitude     block group longitude\n\n    :Missing Attribute Values: None\n\nThis dataset was obtained from the StatLib repository.\nhttps:\/\/www.dcc.fc.up.pt\/~ltorgo\/Regression\/cal_housing.html\n\nThe target variable is the median house value for California districts,\nexpressed in hundreds of thousands of dollars ($100,000).\n\nThis dataset was derived from the 1990 U.S. census, using one row per census\nblock group. A block group is the smallest geographical unit for which the U.S.\nCensus Bureau publishes sample data (a block group typically has a population\nof 600 to 3,000 people).\n\nA household is a group of people residing within a home. Since the average\nnumber of rooms and bedrooms in this dataset are provided per household, these\ncolumns may take surprisingly large values for block groups with few households\nand many empty houses, such as vacation resorts.\n\nIt can be downloaded\/loaded using the\n:func:<code>sklearn.datasets.fetch_california_housing<\/code> function.\n\n.. topic:: References\n\n    - Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,\n      Statistics and Probability Letters, 33 (1997) 291-297<\/code><\/pre>\n<pre><code class=\"language-python\"># Convert to DataFrame\nboston = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)\nboston[&#039;MEDV&#039;] = california_housing.target\n\n# Sample 10 rows\nsampled_data = boston.sample(10)\nprint(sampled_data)\n<\/code><\/pre>\n<pre><code>       MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \\\n11143  3.1884      25.0  5.188630   1.073643      2166.0  2.798450     33.84   \n9961   5.8150      34.0  7.670412   1.183521       780.0  2.921348     38.33   \n12213  6.9930      13.0  6.428571   1.000000       120.0  2.857143     33.51   \n4354   8.9440      30.0  7.170455   1.087500      1776.0  2.018182     34.10   \n4629   2.2708      18.0  2.571135   1.108755      3296.0  2.254446     34.07   \n11026  5.8622      30.0  6.456164   1.038356      2271.0  3.110959     33.80   \n20185  5.9181      24.0  5.700000   1.034375      1049.0  3.278125     34.27   \n17427  2.3333      32.0  5.816976   1.140584      1074.0  2.848806     34.65   \n4080   3.1373      23.0  3.752241   1.074980      2391.0  1.948655     34.15   \n13890  2.2612      12.0  5.235714   1.024405     11139.0  6.630357     34.45   \n\n       Longitude     MEDV  \n11143    -117.94  1.35400  \n9961     -122.26  3.39200  \n12213    -117.18  5.00001  \n4354     -118.39  5.00001  \n4629     -118.30  1.75000  \n11026    -117.83  2.21000  \n20185    -119.16  2.21100  \n17427    -120.47  1.30200  \n4080     -118.37  2.63100  \n13890    -116.14  1.37500  <\/code><\/pre>\n<pre><code class=\"language-python\">from sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\n# Use 2 features\nx = boston[[&#039;AveRooms&#039;, &#039;AveOccup&#039;]].values  # AveRooms - average number of rooms, AveOccup - average number of household members\ny = boston[&#039;MEDV&#039;].values\n\n# Split the data\nx_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=5)\n\n# Scaling\nx_scaler = StandardScaler()\nx_scaler.fit(x_train)\nx_train = x_scaler.transform(x_train)\nx_test = x_scaler.transform(x_test)\n\nprint(&quot;total training samples: {}, total test samples: {}&quot;.format(len(x_train), len(x_test)))<\/code><\/pre>\n<pre><code>total training samples: 16512, total test samples: 4128<\/code><\/pre>\n<pre><code class=\"language-python\">import torch\nfrom torch.utils.data import TensorDataset, DataLoader\nimport torch.nn as nn\n\n# Convert to tensor dataset for PyTorch\nboston_tensor_train_ds = TensorDataset(torch.tensor(x_train, dtype=torch.float), torch.tensor(y_train, dtype=torch.float))\nboston_tensor_test_ds = TensorDataset(torch.tensor(x_test, dtype=torch.float), torch.tensor(y_test, dtype=torch.float))\n\n# Check\nprint(f&#039;sample 0: features: {boston_tensor_train_ds[0][0]}, target: {boston_tensor_train_ds[0][1]}&#039;)\n\n# Define hyper-parameters and create our model\nnum_features = 2\noutput_dim = 1\nbatch_size = 128\nlearning_rate = 0.01\nnum_epochs = 200\n\n# Device\ndevice = torch.device(&quot;cuda:0&quot; if torch.cuda.is_available() else &quot;cpu&quot;)\n\n# Loss criterion\ncriterion = nn.MSELoss()\n\n# Model\nmodel = HousePricesMLP(num_features, output_dim).to(device)\n\n# Optimizer\noptimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)\n\n# DataLoader\ntrain_loader = DataLoader(boston_tensor_train_ds, batch_size=batch_size, shuffle=True)\ntest_loader = DataLoader(boston_tensor_test_ds, batch_size=batch_size, shuffle=False)\n<\/code><\/pre>\n<pre><code>sample 0: features: tensor([-0.7866, -0.0164]), target: 2.8459999561309814<\/code><\/pre>\n<pre><code class=\"language-python\"># Training loop\nfor epoch in range(num_epochs):\n    model.train()\n    for inputs, targets in train_loader:\n        inputs, targets = inputs.to(device), targets.to(device)\n\n        # Zero the parameter gradients\n        optimizer.zero_grad()\n\n        # Forward pass\n        outputs = model(inputs).view(-1)\n        loss = criterion(outputs, targets)\n\n        # Backward pass and optimization\n        loss.backward()\n        optimizer.step()\n\n    # Print loss for every 10 epochs\n    if (epoch+1) % 10 == 0:\n        print(f&#039;Epoch [{epoch+1}\/{num_epochs}], Loss: {loss.item():.4f}&#039;)\n\n# Evaluation\nmodel.eval()\nwith torch.no_grad():\n    test_loss = 0\n    for inputs, targets in test_loader:\n        inputs, targets = inputs.to(device), targets.to(device)\n        outputs = model(inputs).view(-1)\n        loss = criterion(outputs, targets)\n        test_loss += loss.item()\n\n    test_loss \/= len(test_loader)\n    print(f&#039;Test Loss: {test_loss:.4f}&#039;)<\/code><\/pre>\n<pre><code>Epoch [10\/200], Loss: 1.0621\nEpoch [20\/200], Loss: 1.0323\nEpoch [30\/200], Loss: 0.8559\nEpoch [40\/200], Loss: 1.3087\nEpoch [50\/200], Loss: 1.1804\nEpoch [60\/200], Loss: 1.0741\nEpoch [70\/200], Loss: 1.0675\nEpoch [80\/200], Loss: 0.9341\nEpoch [90\/200], Loss: 0.6055\nEpoch [100\/200], Loss: 1.0619\nEpoch [110\/200], Loss: 1.0063\nEpoch [120\/200], Loss: 0.9453\nEpoch [130\/200], Loss: 0.9202\nEpoch [140\/200], Loss: 0.9076\nEpoch [150\/200], Loss: 0.8739\nEpoch [160\/200], Loss: 1.0152\nEpoch [170\/200], Loss: 0.7826\nEpoch [180\/200], Loss: 0.8854\nEpoch [190\/200], Loss: 1.0572\nEpoch [200\/200], Loss: 0.9545\nTest Loss: 1.0264<\/code><\/pre>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/nolan\/64\/re-enter-pincode.png\" style=\"height:50px;display:inline\"> \u6743\u91cd\u521d\u59cb\u5316<\/h3>\n<hr \/>\n<ul>\n<li>\u6b63\u5982\u6211\u4eec\u6240\u4e86\u89e3\u7684\uff0c\u795e\u7ecf\u7f51\u7edc\u662f\u4f7f\u7528\u968f\u673a\u4f18\u5316\u7b97\u6cd5\u8fdb\u884c\u8bad\u7ec3\u7684\uff0c\u4f8b\u5982\u68af\u5ea6\u4e0b\u964d\u3001RMSprop\u3001Adam \u7b49...<\/li>\n<li>\u56de\u60f3\u4e00\u4e0b\uff0c\u8fd9\u4e9b\u7b97\u6cd5\u9700\u8981\u5c06\u53c2\u6570\u521d\u59cb\u5316\u4e3a\u67d0\u4e9b\u503c\u3002\u4e5f\u5c31\u662f\u8bf4\uff0c\u5b83\u4eec\u4f7f\u7528\u968f\u673a\u6027\u6765\u4e3a\u6b63\u5728\u5b66\u4e60\u7684\u6570\u636e\u4e2d\u4ece\u8f93\u5165\u5230\u8f93\u51fa\u7684\u7279\u5b9a\u6620\u5c04\u51fd\u6570\u627e\u5230\u4e00\u7ec4\u8db3\u591f\u597d\u7684\u6743\u91cd\u3002<\/li>\n<li>\u8fd9\u4e9b\u7b97\u6cd5\u8981\u6c42\u5c06\u7f51\u7edc\u7684\u6743\u91cd\u521d\u59cb\u5316\u4e3a\u8f83\u5c0f\u7684\u968f\u673a\u503c\uff08\u968f\u673a\uff0c\u4f46\u63a5\u8fd1\u4e8e\u96f6\uff09\u3002<\/li>\n<li>\u5728\u6bcf\u4e2a\u65f6\u671f\u4e4b\u524d<strong>\u5bf9\u8bad\u7ec3\u6570\u636e\u96c6\u8fdb\u884c\u6253\u6563<\/strong>\u7684\u641c\u7d22\u8fc7\u7a0b\u4e2d\u4e5f\u4f1a\u4f7f\u7528\u968f\u673a\u6027\uff0c\u8fd9\u53cd\u8fc7\u6765\u4f1a\u5bfc\u81f4\u6bcf\u4e2a\u6279\u6b21\u7684\u68af\u5ea6\u4f30\u8ba1\u503c\u4e0d\u540c\u3002<\/li>\n<li>\u8bad\u7ec3\u6df1\u5ea6\u6a21\u578b\u662f\u4e00\u9879\u76f8\u5f53\u56f0\u96be\u7684\u4efb\u52a1\uff0c\u5927\u591a\u6570\u7b97\u6cd5\u90fd\u53d7\u5230\u521d\u59cb\u5316\u9009\u62e9\u7684\u5f3a\u70c8\u5f71\u54cd\uff08\u7b2c 301 \u9875\uff0c<a href=\"https:\/\/amzn.to\/2H5wjfg\">\u6df1\u5ea6\u5b66\u4e60<\/a>\uff0c2016 \u5e74\uff09\u3002<\/li>\n<\/ul>\n<h4><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/?size=100&id=91CnU00i6HLv&format=png&color=000000\" style=\"height:50px;display:inline\"> \u4e3a\u4ec0\u4e48\u4e0d\u76f4\u63a5\u7528\u96f6\u521d\u59cb\u5316\uff1f<\/h4>\n<hr \/>\n<h4>Recent Trend: Non-Random Initializations<\/h4>\n<ul>\n<li>\n<p><a href=\"https:\/\/arxiv.org\/abs\/2007.01038\">Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?<\/a>, \u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u901a\u8fc7\u5c06\u51e0\u4e4e\u6240\u6709\u6743\u91cd\u521d\u59cb\u5316\u4e3a0\uff0c\u6784\u5efa\u4e86\u4e00\u4e2a\u5177\u6709\u76f8\u540c\u7279\u5f81\u7684\u6df1\u5ea6\u7f51\u7edc\u3002\u8be5\u67b6\u6784\u4e0d\u4ec5\u5b9e\u73b0\u4e86\u5b8c\u7f8e\u7684\u4fe1\u53f7\u4f20\u64ad\u548c\u7a33\u5b9a\u7684\u68af\u5ea6\uff0c\u8fd8\u5728\u6807\u51c6\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u53d6\u5f97\u4e86\u5f88\u9ad8\u7684\u51c6\u786e\u7387\uff0c\u8868\u660e\u968f\u673a\u591a\u6837\u5316\u7684\u521d\u59cb\u5316\u5e76\u4e0d\u662f\u8bad\u7ec3\u795e\u7ecf\u7f51\u7edc\u6240\u5fc5\u9700\u7684\u3002<\/p>\n<\/li>\n<li>\n<p><a href=\"https:\/\/openreview.net\/forum?id=1AxQpKmiTc\">ZerO Initialization: Initializing Neural Networks with only Zeros and Ones<\/a>, \u8fd9\u7bc7\u8bba\u6587\u4e2d\uff0c\u968f\u673a\u6743\u91cd\u521d\u59cb\u5316\u88ab\u5b8c\u5168\u786e\u5b9a\u6027\u7684\u521d\u59cb\u5316\u65b9\u6848\u6240\u53d6\u4ee3\uff0c\u8be5\u65b9\u6848\u4f7f\u7528\u96f6\u548c\u4e00\uff08\u7ecf\u8fc7\u5f52\u4e00\u5316\u5904\u7406\uff09\u6765\u521d\u59cb\u5316\u7f51\u7edc\u6743\u91cd\uff0c\u57fa\u4e8e\u8eab\u4efd\u548c\u54c8\u8fbe\u739b\u53d8\u6362\u3002\u4ed6\u4eec\u5728\u5404\u79cd\u57fa\u51c6\u6d4b\u8bd5\u4e2d\u663e\u793a\u51fa\u4e86\u4ee4\u4eba\u9f13\u821e\u7684\u7ed3\u679c\uff0c\u4e3a\u7b80\u5355\u7684\u521d\u59cb\u5316\u65b9\u6848\u94fa\u5e73\u4e86\u9053\u8def\uff0c\u8fd9\u4e9b\u65b9\u6848\u5728\u6548\u679c\u4e0a\u4e0e\u968f\u673a\u521d\u59cb\u5316\u4e00\u6837\u597d\u3002<\/p>\n<\/li>\n<\/ul>\n<p>\u8fd9\u4e9b\u7814\u7a76\u663e\u793a\uff0c\u795e\u7ecf\u7f51\u7edc\u4e0d\u4e00\u5b9a\u9700\u8981\u968f\u673a\u521d\u59cb\u5316\u6743\u91cd\u6765\u53d6\u5f97\u597d\u7684\u8bad\u7ec3\u6548\u679c\u3002<\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/emoji\/96\/000000\/on-arrow-emoji.png\" style=\"height:50px;display:inline\"> \u6743\u91cd\u521d\u59cb\u5316\u7684\u7c7b\u578b<\/h3>\n<hr \/>\n<ul>\n<li>\n<p>\u795e\u7ecf\u7f51\u7edc\u6743\u91cd\u7684\u521d\u59cb\u5316\u662f\u4e00\u4e2a\u6d3b\u8dc3\u7684\u7814\u7a76\u9886\u57df\uff0c\u56e0\u4e3a\u4ed4\u7ec6\u521d\u59cb\u5316\u7f51\u7edc\u53ef\u4ee5\u52a0\u5feb\u5b66\u4e60\u8fc7\u7a0b\u3002<\/p>\n<\/li>\n<li>\n<p>\u6ca1\u6709\u5355\u4e00\u7684\u6700\u4f73\u65b9\u6cd5\u6765\u521d\u59cb\u5316\u795e\u7ecf\u7f51\u7edc\u7684\u6743\u91cd\u3002<\/p>\n<\/li>\n<li>\n<p>\u6211\u4eec\u5c06\u56de\u987e\u4e00\u4e9b\u6d41\u884c\u7684\u521d\u59cb\u5316\u65b9\u6cd5\u3002<\/p>\n<\/li>\n<li>\n<p><strong>Unifrom<\/strong> - \u4f7f\u7528\u4ece\u5747\u5300\u5206\u5e03 $\\mathcal{U}(a, b)$ \u4e2d\u62bd\u53d6\u7684\u503c\u8fdb\u884c\u521d\u59cb\u5316<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.uniform_(tensor, a=0.0, b=1.0)<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Normal<\/strong> - \u4f7f\u7528\u4ece\u6b63\u6001\u5206\u5e03 $\\mathcal{N}(\\text{mean}, \\text{std}^2)$ \u4e2d\u62bd\u53d6\u7684\u503c\u8fdb\u884c\u521d\u59cb\u5316<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.normal_(tensor, mean=0.0, std=1.0)<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Constant<\/strong> - \u4f7f\u7528\u503c $val$ \u8fdb\u884c\u521d\u59cb\u5316\u3002<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.constant_(tensor, val)<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Ones<\/strong> - \u7528\u6807\u91cf\u503c 1 \u521d\u59cb\u5316\u3002<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.ones_(tensor)<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Zeros<\/strong> - \u7528\u6807\u91cf\u503c 0 \u521d\u59cb\u5316\u3002<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.zeros_(tensor)<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Xavier Unifrom<\/strong> - \u6839\u636e<em>Understanding the difficulty of training deep feedforward neural networks - Glorot, X. &amp; Bengio, Y. (2010)<\/em>,\u4e2d\u63cf\u8ff0\u7684\u65b9\u6cd5\uff0c\u4f7f\u7528\u5747\u5300\u5206\u5e03\u8fdb\u884c\u503c\u521d\u59cb\u5316\u3002\u751f\u6210\u7684\u5f20\u91cf\u5c06\u5177\u6709\u4ece $\\mathcal{U}(-a, a)$ \u4e2d\u91c7\u6837\u7684\u503c\uff0c\u5176\u4e2d$$ a = \\text{gain} \\times \\sqrt{\\frac{6}{\\text{fan}_{in} + \\text{fan}_{out}}} $$<\/p>\n<\/li>\n<li>\n<p><code>fan_in<\/code> \u662f\u6743\u91cd\u5f20\u91cf\u4e2d\u7684\u8f93\u5165\u5355\u5143\u6570\uff0c<code>fan_out<\/code> \u662f\u6743\u91cd\u5f20\u91cf\u4e2d\u7684\u8f93\u51fa\u5355\u5143\u6570\uff0c<code>gain<\/code>\u7684\u4e3b\u8981\u4f5c\u7528\u662f\u8c03\u6574\u521d\u59cb\u5316\u6743\u91cd\u7684\u5c3a\u5ea6\uff0c\u4f7f\u5f97\u4fe1\u53f7\u5728\u7f51\u7edc\u4e2d\u4f20\u9012\u65f6\u4e0d\u4f1a\u51fa\u73b0\u68af\u5ea6\u6d88\u5931\u6216\u68af\u5ea6\u7206\u70b8\u7684\u95ee\u9898\u3002<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.xavier_uniform_(tensor, gain=1.0)<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Xavier Normal<\/strong> - \u6839\u636e <em>\u7406\u89e3\u8bad\u7ec3\u6df1\u5ea6\u524d\u9988\u795e\u7ecf\u7f51\u7edc\u7684\u96be\u5ea6 - Glorot, X. &amp; Bengio, Y. (2010)<\/em> \u4e2d\u63cf\u8ff0\u7684\u65b9\u6cd5\u4f7f\u7528\u6b63\u6001\u5206\u5e03\u521d\u59cb\u5316\u503c\u3002\u751f\u6210\u7684\u5f20\u91cf\u5c06\u5177\u6709\u4ece $\\mathcal{N}(0,\\text{std}^2)$ \u4e2d\u91c7\u6837\u7684\u503c\uff0c\u5176\u4e2d $$ \\text{std} = \\text{gain} \\times \\sqrt{\\frac{2}{\\text{fan}_{in} + \\text{fan}_{out}}} $$<\/p>\n<\/li>\n<li>\n<p><code>fan_in<\/code> \u662f\u6743\u91cd\u5f20\u91cf\u4e2d\u7684\u8f93\u5165\u5355\u5143\u6570\uff0c<code>fan_out<\/code> \u662f\u6743\u91cd\u5f20\u91cf\u4e2d\u7684\u8f93\u51fa\u5355\u5143\u6570<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.xavier_normal_(tensor, gain=1.0)<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Kaiming (He) Uniform<\/strong> - \u6839\u636e <em>Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015)<\/em>,  \u4e2d\u63cf\u8ff0\u7684\u65b9\u6cd5\u4f7f\u7528\u5747\u5300\u5206\u5e03\u521d\u59cb\u5316\u503c\u3002\u751f\u6210\u7684\u5f20\u91cf\u5c06\u5177\u6709\u4ece $\\mathcal{U}(-\\text{bound}, \\text{bound})$ \u4e2d\u91c7\u6837\u7684\u503c\uff0c\u5176\u4e2d $$ \\text{bound} = \\text{gain} \\times \\sqrt{\\frac{3}{\\text{fan-mode}}} $$<\/p>\n<\/li>\n<li>\n<p>\u5728 PyTorch \u4e2d - <code>torch.nn.init.kaiming_uniform_(tensor, a=0, mode=&#039;fan_in&#039;, nonlinearity=&#039;leaky_relu&#039;)<\/code><\/p>\n<\/li>\n<li>\n<p><code>a<\/code> - \u4f7f\u7528leaky_relu\u7684\u8d1f\u659c\u7387\uff08\u4ec5\u4e0e <code>leaky_relu<\/code> \u4e00\u8d77\u4f7f\u7528\uff09<\/p>\n<\/li>\n<li>\n<p><code>gain<\/code>\uff1a\u7f29\u653e\u56e0\u5b50\uff0c\u901a\u5e38\u5bf9\u4e8eReLU\u548cLeaky ReLU\u4e3a$sqrt{2}$\u3002<\/p>\n<\/li>\n<li>\n<p><code>fan_mode<\/code>\uff1a\u53ef\u4ee5\u662f<code>fan_in<\/code>\u6216<code>fan_out<\/code>\u3002<\/p>\n<\/li>\n<li>\n<p><code>fan_in<\/code>\uff1a\u8868\u793a\u6743\u91cd\u5f20\u91cf\u4e2d\u7684\u8f93\u5165\u5355\u5143\u6570\uff08\u4e0a\u4e00\u5c42\u795e\u7ecf\u5143\u7684\u6570\u91cf\uff09\u3002\u8fd9\u610f\u5473\u7740\u5728\u521d\u59cb\u5316\u65f6\uff0c\u8003\u8651\u7684\u662f\u6bcf\u4e2a\u795e\u7ecf\u5143\u5728\u524d\u5411\u4f20\u64ad\u4e2d\u7684\u8f93\u5165\u6570\u91cf\uff0c\u4ee5\u786e\u4fdd\u4fe1\u53f7\u4e0d\u4f1a\u5728\u4f20\u9012\u8fc7\u7a0b\u4e2d\u53d8\u5f97\u8fc7\u5927\u6216\u8fc7\u5c0f\u3002<\/p>\n<\/li>\n<li>\n<p><code>fan_out<\/code>\uff1a\u8868\u793a\u6743\u91cd\u5f20\u91cf\u4e2d\u7684\u8f93\u51fa\u5355\u5143\u6570\uff08\u4e0b\u4e00\u5c42\u795e\u7ecf\u5143\u7684\u6570\u91cf\uff09\u3002\u8fd9\u610f\u5473\u7740\u5728\u521d\u59cb\u5316\u65f6\uff0c\u8003\u8651\u7684\u662f\u6bcf\u4e2a\u795e\u7ecf\u5143\u5728\u53cd\u5411\u4f20\u64ad\u4e2d\u7684\u8f93\u51fa\u6570\u91cf\uff0c\u4ee5\u786e\u4fdd\u68af\u5ea6\u4e0d\u4f1a\u5728\u4f20\u64ad\u8fc7\u7a0b\u4e2d\u53d8\u5f97\u8fc7\u5927\u6216\u8fc7\u5c0f\u3002<\/p>\n<\/li>\n<\/ul>\n<p>\u5728\u521d\u59cb\u5316\u65f6\uff0c<code>fan_in<\/code>\u548c<code>fan_out<\/code>\u90fd\u662f\u57fa\u4e8e\u6743\u91cd\u5f20\u91cf\u7684\u5f62\u72b6\uff08\u5c3a\u5bf8\uff09\u8ba1\u7b97\u7684\uff0c\u8fd9\u4e9b\u5f62\u72b6\u5728\u7f51\u7edc\u7ed3\u6784\u5b9a\u4e49\u65f6\u5c31\u5df2\u7ecf\u786e\u5b9a\u3002\u4f8b\u5982\uff0c\u5bf9\u4e8e\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\uff0c\u5176\u6743\u91cd\u5f20\u91cf\u7684\u5f62\u72b6\u662f$[ \\text{fan_out}, \\text{fan_in} ]$\u3002<\/p>\n<ul>\n<li><strong>Kaiming (He) Normal<\/strong> - \u6839\u636e <em>Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015)<\/em>, \u4e2d\u63cf\u8ff0\u7684\u65b9\u6cd5\u4f7f\u7528\u6b63\u6001\u5206\u5e03\u521d\u59cb\u5316\u503c\u3002\u751f\u6210\u7684\u5f20\u91cf\u5c06\u5177\u6709\u4ece $\\mathcal{N}(0,\\text{std}^2)$ \u91c7\u6837\u7684\u503c\uff0c\u5176\u4e2d $$ \\text{std} = \\frac{\\text{gain}}{\\sqrt{\\text{fan-mode}}} $$<\/li>\n<li>\u5728 PyTorch \u4e2d - <code>torch.nn.init.kaiming_normal_(tensor, a=0, mode=&#039;fan_in&#039;, nonlinearity=&#039;leaky_relu&#039;)<\/code><\/li>\n<\/ul>\n<p>PyTorch \u5177\u6709\u901a\u5e38\u6548\u679c\u826f\u597d\u7684\u9ed8\u8ba4\u521d\u59cb\u5316\u65b9\u6848\u3002\u4f8b\u5982\uff0c<code>kaiming_uniform<\/code> \u662f <a href=\"https:\/\/pytorch.org\/docs\/stable\/_modules\/torch\/nn\/modules\/linear.html#Linear\">PyTorch \u4e2d\u7528\u4e8e <code>Linear<\/code> \u5c42\u7684\u9ed8\u8ba4\u521d\u59cb\u5316<\/a>\uff1a<\/p>\n<h4>Interactive Demo<\/h4>\n<hr \/>\n<p><a href=\"https:\/\/www.deeplearning.ai\/ai-notes\/initialization\/\">Different Initializations Demo<\/a><\/p>\n<h3><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/cotton\/64\/000000\/olympic-torch.png\" style=\"height:50px;display:inline\"> \u4f7f\u7528 PyTorch \u521d\u59cb\u5316\u795e\u7ecf\u7f51\u7edc\u6743\u91cd<\/h3>\n<hr \/>\n<ul>\n<li>\u4ece PyTorch 1.0 \u5f00\u59cb\uff0c<strong>\u5927\u591a\u6570\u5c42\u9ed8\u8ba4\u4f7f\u7528 Kaiming Uniform \u65b9\u6cd5\u521d\u59cb\u5316<\/strong>\u3002<\/li>\n<li>\u8ba9\u6211\u4eec\u770b\u770b\u5982\u4f55\u66f4\u6539\u6a21\u578b\u7684\u521d\u59cb\u5316\u3002<\/li>\n<li><a href=\"https:\/\/pytorch.org\/docs\/stable\/nn.init.html\">\u5b98\u65b9 PyTorch \u521d\u59cb\u5316\u6587\u6863<\/a>\u3002<\/li>\n<\/ul>\n<pre><code class=\"language-python\"># define hyper-parmeters and create our model\nnum_features = 2\noutput_dim = 1\nbatch_size = 128\nlearning_rate = 0.01\nnum_epochs = 500\n# device\ndevice = torch.device(&quot;cuda:0&quot; if torch.cuda.is_available() else &quot;cpu&quot;)\n# loss criterion\ncriterion = nn.MSELoss()\n# model\nmodel = HousePricesMLP(num_features, output_dim).to(device)\n# optimizer\noptimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)<\/code><\/pre>\n<pre><code class=\"language-python\"># use a different initialization for the model\ndef weights_init(m):\n    classname = m.__class__.__name__\n    if classname.find(&#039;Linear&#039;) != -1:\n        torch.nn.init.xavier_normal_(m.weight, gain=1.0)\nmodel.apply(weights_init)<\/code><\/pre>\n<pre><code>HousePricesMLP(\n  (hidden): Sequential(\n    (0): Linear(in_features=2, out_features=256, bias=True)\n    (1): ReLU()\n    (2): Linear(in_features=256, out_features=256, bias=True)\n    (3): ReLU()\n    (4): Linear(in_features=256, out_features=256, bias=True)\n    (5): ReLU()\n  )\n  (output_layer): Linear(in_features=256, out_features=1, bias=True)\n)<\/code><\/pre>\n<pre><code class=\"language-python\"># another way to do that\nclass HousePricesMLP(nn.Module):\n    def __init__(self, input_dim, output_dim):\n        super(HousePricesMLP, self).__init__()\n        self.hidden = nn.Sequential(nn.Linear(input_dim, 4),\n                                    nn.ReLU(),\n                                    nn.Linear(4, 3),\n                                    nn.ReLU())\n        self.output_layer = nn.Linear(3, output_dim)\n        # NEW: init weights here\n        self.init_weights()\n\n    def forward(self, x):\n        return self.output_layer(self.hidden(x))\n\n    def init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Linear):\n                torch.nn.init.xavier_normal_(m.weight, gain=1.0)\n                if m.bias is not None:\n                    torch.nn.init.constant_(m.bias, 0)<\/code><\/pre>\n<pre><code class=\"language-python\">import numpy as np\nboston_tensor_train_dataloader = DataLoader(boston_tensor_train_ds, batch_size=batch_size, shuffle=True)\n\n# training loop for the model\nfor epoch in range(num_epochs):\n    epoch_losses = []\n    for features, targets in boston_tensor_train_dataloader:\n        # send data to device\n        features = features.to(device)\n        targets = targets.to(device)\n        # forward pass\n        output = model(features)\n        # loss\n        loss = criterion(output.view(-1), targets)\n        # backward pass\n        optimizer.zero_grad()  # clean the gradients from previous iteration\n        loss.backward()  # autograd backward to calculate gradients\n        optimizer.step()  # apply update to the weights\n        epoch_losses.append(loss.item())\n    if epoch % 50 == 0:\n        print(f&#039;epoch: {epoch} loss: {np.mean(epoch_losses)}&#039;)\n\n# test error\nmodel.eval()\nwith torch.no_grad():\n    test_outputs = model(torch.tensor(x_test, dtype=torch.float, device=device))\n    test_error = criterion(test_outputs.view(-1), torch.tensor(y_test, dtype=torch.float, device=device))\nprint(f&#039;test MSE error: {test_error.item()}&#039;)<\/code><\/pre>\n<pre><code>epoch: 0 loss: 1.3302481183710024\nepoch: 50 loss: 0.9436245888702629\nepoch: 100 loss: 0.9447847960531249\nepoch: 150 loss: 0.9419918522354245\nepoch: 200 loss: 0.9397756309472314\nepoch: 250 loss: 0.9362258513768514\nepoch: 300 loss: 0.9383858072665311\nepoch: 350 loss: 0.9368446367655614\nepoch: 400 loss: 0.936691609926002\nepoch: 450 loss: 0.9354286706724833\ntest MSE error: 0.9552490711212158<\/code><\/pre>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/bubbles\/50\/000000\/alps.png\" style=\"height:50px;display:inline\"> \u6df1\u5ea6\u53cc\u91cd\u4e0b\u964d(Deep Double Descent)<\/h2>\n<hr \/>\n<ul>\n<li>\u673a\u5668\u5b66\u4e60\u7b97\u6cd5\u8bad\u7ec3\u4e2d\u7684\u53cc\u91cd\u4e0b\u964d\uff1a\u968f\u7740\u6a21\u578b\u5927\u5c0f\u3001\u6570\u636e\u5927\u5c0f\u6216\u8bad\u7ec3\u65f6\u95f4\u7684\u589e\u52a0\uff0c\u6027\u80fd\u9996\u5148\u63d0\u9ad8\uff0c\u7136\u540e\u53d8\u5dee\uff0c\u7136\u540e\u518d\u6b21\u63d0\u9ad8\u3002<\/li>\n<li>\u901a\u5e38\u53ef\u4ee5\u901a\u8fc7\u4ed4\u7ec6\u7684<strong>\u6b63\u5219\u5316<\/strong>\u6216<strong>\u63d0\u524d\u505c\u6b62<\/strong>\u6765\u907f\u514d\u8fd9\u79cd\u5f71\u54cd\u3002<\/li>\n<li>\u867d\u7136\u8fd9\u79cd\u884c\u4e3a\u4f3c\u4e4e\u76f8\u5f53\u666e\u904d\uff0c\u4f46<em>\u6211\u4eec\u8fd8\u4e0d\u5b8c\u5168\u7406\u89e3\u5b83\u4e3a\u4ec0\u4e48\u4f1a\u53d1\u751f<\/em>\u3002<\/li>\n<\/ul>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729225716791.png\" style=\"height:300px\">\n<\/p>\n<p>\u6df1\u5ea6\u53cc\u4e0b\u964d\u6cd5\u6311\u6218\u4e86\u504f\u5dee-\u65b9\u5dee\u6743\u8861\u7684\u4f20\u7edf\u89c2\u70b9\uff0c\u5373\u589e\u52a0\u6a21\u578b\u590d\u6742\u5ea6\u901a\u5e38\u4f1a\u5bfc\u81f4\u8fc7\u5ea6\u62df\u5408\u548c\u66f4\u9ad8\u7684\u6d4b\u8bd5\u8bef\u5dee\u3002<\/p>\n<p>\u5728\u73b0\u4ee3\u6df1\u5ea6\u5b66\u4e60\u6a21\u578b\u4e2d\uff0c\u5c24\u5176\u662f\u5728\u5177\u6709\u5927\u89c4\u6a21\u6570\u636e\u96c6\u548c\u67b6\u6784\u7684\u6a21\u578b\u4e2d\uff0c\u8fd9\u79cd\u73b0\u8c61\u51f8\u663e\u4e86\u6d4b\u8bd5\u8bef\u5dee\u7684\u975e\u5e73\u51e1\u884c\u4e3a\uff1a<\/p>\n<ul>\n<li>\u521d\u59cb\u4e0b\u964d\uff1a\u968f\u7740\u6a21\u578b\u590d\u6742\u6027\u7684\u589e\u52a0\uff0c\u6a21\u578b\u66f4\u597d\u5730\u62df\u5408\u6570\u636e\uff0c\u4ece\u800c\u51cf\u5c11\u504f\u5dee\u3002<\/li>\n<li>\u4e2d\u7ea7\u4e0a\u5347\uff1a\u590d\u6742\u6027\u7684\u8fdb\u4e00\u6b65\u589e\u52a0\u5bfc\u81f4\u8fc7\u5ea6\u62df\u5408\uff0c\u6a21\u578b\u5f00\u59cb\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u6355\u83b7\u566a\u58f0\uff0c\u4ece\u800c\u589e\u52a0\u65b9\u5dee\u548c\u6d4b\u8bd5\u8bef\u5dee\u3002<\/li>\n<li>\u7b2c\u4e8c\u6b21\u4e0b\u964d\uff1a\u8d85\u8fc7\u4e00\u5b9a\u7684\u590d\u6742\u6027\u9608\u503c\u540e\uff0c\u6a21\u578b\u53d8\u5f97\u975e\u5e38\u5f3a\u5927\uff0c\u53ef\u4ee5\u901a\u8fc7\u6709\u6548\u5229\u7528\u5927\u6570\u636e\u96c6\u548c\u6b63\u5219\u5316\u6280\u672f\u8fdb\u884c\u66f4\u597d\u7684\u6982\u62ec\uff0c\u4ece\u800c\u518d\u6b21\u51cf\u5c11\u6d4b\u8bd5\u9519\u8bef\u3002<\/li>\n<\/ul>\n<p>\u8fd9\u4e00\u89c1\u89e3\u5bf9\u4e8e\u6a21\u578b\u8bad\u7ec3\u548c\u67b6\u6784\u8bbe\u8ba1\u5177\u6709\u5b9e\u9645\u610f\u4e49\uff0c\u8868\u660e\u5728\u67d0\u4e9b\u60c5\u51b5\u4e0b\uff0c\u589e\u52a0\u6a21\u578b\u590d\u6742\u6027\u548c\u6570\u636e\u6700\u7ec8\u53ef\u80fd\u4f1a\u5e26\u6765\u66f4\u597d\u7684\u6cdb\u5316\u6027\u80fd\uff0c\u8fd9\u4e0e\u4f20\u7edf\u9884\u671f\u76f8\u53cd\u3002\u6b63\u5219\u5316\u6280\u672f\u548c\u8bad\u7ec3\u671f\u95f4\u7684\u4ed4\u7ec6\u76d1\u63a7\u5bf9\u4e8e\u6709\u6548\u5f15\u5bfc\u8fd9\u79cd\u884c\u4e3a\u81f3\u5173\u91cd\u8981\u3002<\/p>\n<h4>\u4ece\u6a21\u578b\u89d2\u5ea6\u770b\u5f85\u53cc\u91cd\u4e0b\u964d<\/h4>\n<hr \/>\n<ul>\n<li>\n<p>\u5b58\u5728<strong>\u6a21\u578b\u8d8a\u5927\u8d8a\u5dee<\/strong>\u7684\u60c5\u51b5\u3002<\/p>\n<\/li>\n<li>\n<p>\u6a21\u578b\u7ea7\u53cc\u91cd\u4e0b\u964d\u73b0\u8c61\u53ef\u80fd\u5bfc\u81f4\u4f7f\u7528\u66f4\u591a\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\u7684\u6548\u679c\u53d8\u5dee\u3002<\/p>\n<p align=\"center\">\n<img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729225811323.png\" style=\"height:400px\">\n<\/p>\n<\/li>\n<li>\n<p>Classical Regime\uff08\u7ecf\u5178\u533a\u95f4\uff09\uff1a<\/p>\n<\/li>\n<\/ul>\n<p>\u8fd9\u91cc\u5c55\u793a\u4e86\u4f20\u7edf\u7684\u504f\u5dee-\u65b9\u5dee\u6743\u8861\u7406\u8bba\u3002<br \/>\n\u5728\u8fd9\u4e2a\u533a\u95f4\uff0c\u968f\u7740\u6a21\u578b\u590d\u6742\u5ea6\u589e\u52a0\uff0c\u8bef\u5dee\u5148\u4e0b\u964d\uff0c\u7136\u540e\u7531\u4e8e\u8fc7\u62df\u5408\u518d\u6b21\u4e0a\u5347\u3002<\/p>\n<ul>\n<li>Critical Regime\uff08\u5173\u952e\u533a\u95f4\uff09\uff1a<\/li>\n<\/ul>\n<p>\u5728\u8fd9\u4e2a\u533a\u95f4\uff0c\u6a21\u578b\u8bef\u5dee\u8868\u73b0\u51fa\u8f83\u4e3a\u5267\u70c8\u7684\u6ce2\u52a8\u3002<br \/>\n\u6d4b\u8bd5\u8bef\u5dee\u7684\u5cf0\u503c\u51fa\u73b0\u5728\u63d2\u503c\u9608\u503c\u9644\u8fd1\uff0c\u6b64\u65f6\u6a21\u578b\u521a\u597d\u8db3\u591f\u5927\u4ee5\u9002\u5e94\u8bad\u7ec3\u96c6\u3002<\/p>\n<ul>\n<li>Modern Regime\uff08\u73b0\u4ee3\u533a\u95f4\uff09\uff1a<\/li>\n<\/ul>\n<p>\u5728\u8fd9\u4e2a\u533a\u57df\uff0c\u968f\u7740\u6a21\u578b\u89c4\u6a21\u8fdb\u4e00\u6b65\u589e\u52a0\uff0c\u8bef\u5dee\u518d\u6b21\u4e0b\u964d\uff0c\u6a21\u578b\u53d8\u5f97\u66f4\u597d\u3002<\/p>\n<ul>\n<li>\u5b9e\u9645\u60c5\u51b5\uff08Reality\uff09\uff08\u84dd\u8272\u66f2\u7ebf\uff09\uff1a<\/li>\n<\/ul>\n<p>\u5c55\u793a\u4e86\u5b9e\u9645\u89c2\u5bdf\u5230\u7684\u884c\u4e3a\uff0c\u6d4b\u8bd5\u8bef\u5dee\u5148\u4e0b\u964d\uff0c\u7136\u540e\u5728\u4e34\u754c\u533a\u95f4\u4e0a\u5347\uff0c\u6700\u7ec8\u5728\u73b0\u4ee3\u533a\u95f4\u518d\u6b21\u4e0b\u964d\u3002<\/p>\n<ul>\n<li>\u8bad\u7ec3\u8bef\u5dee\uff08Train\uff09\uff08\u7eff\u8272\u66f2\u7ebf\uff09\uff1a<\/li>\n<\/ul>\n<p>\u968f\u7740\u6a21\u578b\u590d\u6742\u5ea6\u7684\u589e\u52a0\uff0c\u8bad\u7ec3\u8bef\u5dee\u6301\u7eed\u4e0b\u964d\uff0c\u8fd9\u8868\u660e\u6a21\u578b\u5728\u8bad\u7ec3\u96c6\u4e0a\u7684\u62df\u5408\u80fd\u529b\u4e0d\u65ad\u589e\u5f3a\u3002<\/p>\n<p>\u603b\u7ed3\uff1a<\/p>\n<p>\u4e34\u754c\u533a\u95f4\uff1a\u53cc\u91cd\u4e0b\u964d\u73b0\u8c61\u4e3b\u8981\u5728\u4e34\u754c\u533a\u95f4\u51fa\u73b0\uff0c\u8fd9\u91cc\u6a21\u578b\u8bef\u5dee\u7684\u6ce2\u52a8\u663e\u8457\u3002<\/p>\n<p>\u73b0\u4ee3\u533a\u95f4\uff1a\u5728\u8db3\u591f\u5927\u7684\u6a21\u578b\u4e0b\uff0c\u8bef\u5dee\u518d\u6b21\u4e0b\u964d\uff0c\u8bc1\u660e\u66f4\u590d\u6742\u7684\u6a21\u578b\u5728\u5904\u7406\u5927\u89c4\u6a21\u6570\u636e\u65f6\u6709\u66f4\u597d\u7684\u8868\u73b0\u3002<\/p>\n<p>\u6570\u636e\u548c\u6a21\u578b\u590d\u6742\u5ea6\u7684\u5173\u7cfb\uff1a\u53cc\u91cd\u4e0b\u964d\u73b0\u8c61\u63d0\u9192\u6211\u4eec\uff0c\u5728\u8bbe\u8ba1\u548c\u8bad\u7ec3\u6a21\u578b\u65f6\uff0c\u4e0d\u80fd\u7b80\u5355\u5730\u4f9d\u8d56\u589e\u52a0\u6570\u636e\u548c\u6a21\u578b\u590d\u6742\u5ea6\u6765\u63d0\u5347\u6027\u80fd\uff0c\u9700\u8981\u8003\u8651\u66f4\u591a\u7684\u56e0\u7d20\u548c\u7b56\u7565\uff0c\u5982\u6b63\u5219\u5316\u548c\u9002\u5f53\u7684\u65e9\u505c\u7b56\u7565\u3002<\/p>\n<h4>\u6837\u672c\u975e\u5355\u8c03\u6027<\/h4>\n<hr \/>\n<ul>\n<li>\u6837\u672c\u975e\u5355\u8c03\u6027\uff08Sample-wise Non-monotonicity\uff09\uff1a<\/li>\n<\/ul>\n<p>\u8fd9\u4e2a\u73b0\u8c61\u6307\u7684\u662f\uff0c\u589e\u52a0\u8bad\u7ec3\u6837\u672c\u6570\u91cf\u6709\u65f6\u53cd\u800c\u4f1a\u635f\u5bb3\u6a21\u578b\u6027\u80fd\uff0c\u8fd9\u4e0e\u901a\u5e38\u8ba4\u4e3a\u7684\u66f4\u591a\u6570\u636e\u4f1a\u63d0\u5347\u6a21\u578b\u51c6\u786e\u7387\u7684\u9884\u671f\u76f8\u53cd\u3002<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729225857427.png\" style=\"height:300px\">\n<\/p>\n<h4>\u8bad\u7ec3\u8f6e\u6570\u4e0e\u6a21\u578b\u5927\u5c0f<\/h4>\n<hr \/>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729230000779.png\" style=\"height:300px\">\n<\/p>\n<p align=\"center\">\n  <img decoding=\"async\" src=\"https:\/\/gnnclub-1311496010.cos.ap-beijing.myqcloud.com\/wp-content\/uploads\/2024\/07\/20240729230029494.png\" style=\"height:300px\">\n<\/p>\n<h2><img decoding=\"async\" src=\"https:\/\/img.icons8.com\/dusk\/64\/000000\/prize.png\" style=\"height:50px;display:inline\"> Credits<\/h2>\n<hr \/>\n<ul>\n<li>Icons made by <a href=\"https:\/\/www.flaticon.com\/authors\/becris\" title=\"Becris\">Becris<\/a> from <a href=\"https:\/\/www.flaticon.com\/\" title=\"Flaticon\">www.flaticon.com<\/a><\/li>\n<li>Icons from <a href=\"https:\/\/icons8.com\/\">Icons8.com<\/a> - <a href=\"https:\/\/icons8.com\">https:\/\/icons8.com<\/a><\/li>\n<li>Datasets from <a href=\"https:\/\/www.kaggle.com\/\">Kaggle<\/a> - <a href=\"https:\/\/www.kaggle.com\/\">https:\/\/www.kaggle.com\/<\/a><\/li>\n<li><a href=\"https:\/\/machinelearningmastery.com\/why-initialize-a-neural-network-with-random-weights\/\">Jason Brownlee - Why Initialize a Neural Network with Random Weights?<\/a><\/li>\n<li><a href=\"https:\/\/openai.com\/blog\/deep-double-descent\/\">OpenAI - Deep Double Descent<\/a><\/li>\n<li><a href=\"https:\/\/taldatech.github.io\">Tal Daniel<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Deep Learning create by Arwin Yu Tutorial 01 &#8211; Neural N [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1741,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,24],"tags":[],"class_list":["post-1725","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-18","category-24"],"_links":{"self":[{"href":"http:\/\/www.gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/1725","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.gnn.club\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.gnn.club\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.gnn.club\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1725"}],"version-history":[{"count":19,"href":"http:\/\/www.gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/1725\/revisions"}],"predecessor-version":[{"id":1974,"href":"http:\/\/www.gnn.club\/index.php?rest_route=\/wp\/v2\/posts\/1725\/revisions\/1974"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.gnn.club\/index.php?rest_route=\/wp\/v2\/media\/1741"}],"wp:attachment":[{"href":"http:\/\/www.gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1725"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1725"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.gnn.club\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1725"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}