看穿機器學習（W-GAN模型）的黑箱

黑馬_御風 2017-02-12

展開全文

圖a. Principle of GAN.

前兩天紐約暴雪，天地一片蒼茫。今天元宵節(jié)，長島依然清冷寂寥，正月十五鬧花燈的喧囂熱鬧已成為悠遠的回憶。這學期，老顧在講授一門研究生水平的數(shù)字幾何課程，目前講到了2016年和丘成桐先生、羅鋒教授共同完成的一個幾何定理【3】，這個工作給出了經(jīng)典亞歷山大定理（Alexandrov Theorem）的構(gòu)造性證明，也給出了最優(yōu)傳輸理論（Optimal Mass Transportation）的一個幾何解釋。這幾天，機器學習領(lǐng)域的Wasserstein GAN突然變得火熱，其中關(guān)鍵的概念可以完全用我們的理論來給出幾何解釋，這允許我們在一定程度上親眼“看穿”傳統(tǒng)機器學習中的“黑箱”。下面是老顧下周一授課的講稿。

生成對抗網(wǎng)絡(luò) GAN

訓練模型 生成對抗網(wǎng)絡(luò)GAN （Generative Adversarial Networks）是一個“自相矛盾”的系統(tǒng)，就是以己之矛克以己之盾，在矛盾中發(fā)展，使得矛更加鋒利，盾更加強韌。這里的矛被稱為是判別器（Descriminator），這里的盾被稱為是生成器（Generator）。

圖b. Generative Model.

生成器G一般是將一個隨機變量（例如高斯分布，或者均勻分布），通過參數(shù)化的概率生成模型（通常是用一個深度神經(jīng)網(wǎng)來進行參數(shù)化），進行概率分布的逆變換采樣，從而得到一個生成的概率分布。判別器D也通常采用深度卷積神經(jīng)網(wǎng)。

圖1. GAN的算法流程圖。

矛盾的交鋒過程如下：給定真實的數(shù)據(jù)，其內(nèi)部的統(tǒng)計規(guī)律表示為概率分布 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，我們的目的就是能夠找出。為此，我們制作了一個隨機變量生成器G，G能夠產(chǎn)生隨機變量，其概率分布是 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，我們希望盡量接近。為了區(qū)分真實概率分布和生成概率分布，我們又制作了一個判別器D，給定一個樣本，D來復制判別這個樣本是來自真實數(shù)據(jù)還是來自偽造數(shù)據(jù)。Goodfellow給GAN中的判別器設(shè)計了如下的損失函數(shù)（lost function），盡可能將真實樣本判為正例，生成樣本判為負例：

This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 。

第一項不依賴于生成器G, 此式也可以定義GAN中的生成器的損失函數(shù)。

在訓練中，判別器D和生成器G交替學習，最終達到納什均衡（零和游戲），判別器無法區(qū)分真實樣本和生成樣本。

優(yōu)點 GAN具有非常重要的優(yōu)越性。當真實數(shù)據(jù)的概率分布不可計算的時候，傳統(tǒng)依賴于數(shù)據(jù)內(nèi)在解釋的生成模型無法直接應(yīng)用。但是GAN依然可以使用，這是因為GAN引入了內(nèi)部對抗的訓練機制，能夠逼近一下難以計算的概率分布。更為重要的，Yann LeCun一直積極倡導GAN，因為GAN為無監(jiān)督學習提供了一個強有力的算法框架，而無監(jiān)督學習被廣泛認為是通往人工智能重要的一環(huán)。

缺點原始GAN形式具有致命缺陷：判別器越好，生成器的梯度消失越嚴重。我們固定生成器G來優(yōu)化判別器D?？疾烊我庖粋€樣本，其對判別器損失函數(shù)的貢獻是

兩邊對 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 求導，得到最優(yōu)判別器函數(shù)

代入生成器損失函數(shù)，我們得到所謂的Jensen-Shannon散度（JS）

在這種情況下（判別器最優(yōu)），如果 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的支撐集合（support）交集為零測度，則生成器的損失函數(shù)恒為0，梯度消失。

改進本質(zhì)上，JS散度給出了概率分布 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 之間的差異程度，亦即概率分布間的度量。我們可以用其他的度量來替換JS散度。Wasserstein距離就是一個好的選擇，因為即便 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的支撐集合（support）交集為零測度，它們之間的Wasserstein距離依然非零。這樣，我們就得到了Wasserstein GAN的模式【1】【2】。Wasserstein距離的好處在于即便 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 兩個分布之間沒有重疊，Wasserstein距離依然能夠度量它們的遠近。

為此，我們引入最優(yōu)傳輸?shù)膸缀卫碚摚∣ptimal Mass Transportation），這個理論可視化了W-GAN的關(guān)鍵概念，例如概率分布，概率生成模型（生成器），Wasserstein距離。更為重要的，這套理論中，所有的概念，原理都是透明的。例如，對于概率生成模型，理論上我們可以用最優(yōu)傳輸?shù)目蚣苋〈疃壬窠?jīng)網(wǎng)絡(luò)來構(gòu)造生成器，從而使得黑箱透明。

最優(yōu)傳輸理論梗概

給定歐氏空間中的一個區(qū)域 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，上面定義有兩個概率測度和，滿足

我們尋找一個區(qū)域到自身的同胚映射（diffeomorphism）， This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. , 滿足兩個條件：保持測度和極小化傳輸代價。

保持測度 對于一切波萊爾集 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ,

換句話說映射T將概率分布 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 映射成了概率分布，記成。直觀上，自映射，帶來體積元的變化，因此改變了概率分布。我們用 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 和來表示概率密度函數(shù)，用來表示映射的雅克比矩陣（Jacobian matrix），那么保持測度的微分方程應(yīng)該是： This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ,

這被稱為是雅克比方程（Jacobian Equation）。

最優(yōu)傳輸映射 自映射 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的傳輸代價（Transportation Cost）定義為

在所有保持測度的自映射中，傳輸代價最小者被稱為是最優(yōu)傳輸映射（Optimal Mass Transportation Map），亦即：

最優(yōu)傳輸映射的傳輸代價被稱為是概率測度 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 和概率測度之間的Wasserstein距離，記為。

在這種情形下，Brenier證明存在一個凸函數(shù) This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，其梯度映射

就是唯一的最優(yōu)傳輸映射。這個凸函數(shù)被稱為是Brenier勢能函數(shù)（Brenier potential）。

由Jacobian方程，我們得到Brenier勢滿足蒙日-安培方程，梯度映射的雅克比矩陣是Brenier勢能函數(shù)的海森矩陣（Hessian Matrix），

蒙日-安培方程解的存在性、唯一性等價于經(jīng)典的凸幾何中的亞歷山大定理（Alexandrov Theorem）。

圖2. 亞歷山大定理。

亞歷山大定理 如圖2所示，給定平面凸區(qū)域 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，考察一個開放的凸多面體，選定一個面，的法向量記為， This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的投影和相交的面積記為，則總投影面積滿足

凸多面體可以被 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 確定。亞歷山大定理對任意維凸多面體都成立。

后面，我們可以看到，這個凸多面體就是Brenier勢能函數(shù)，其梯度映射將一個概率分布 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 映到另外一個概率分布，并且這兩個概率分布之間的Wasserstein 距離對偶于此凸多面體決定的體積。理論上，這個凸多面體可以作為W-GAN模型中的生成器G。

W-GAN中關(guān)鍵概念可視化

Wasserstein-GAN模型中，關(guān)鍵的概念包括概率分布（概率測度），概率測度間的最優(yōu)傳輸映射（生成器），概率測度間的Wasserstein距離。下面，我們詳細解釋每個概念所對應(yīng)的構(gòu)造方法，和相應(yīng)的幾何意義。

概率分布 GAN模型中有兩個至關(guān)重要的概率分布（probability measure），一個是真實數(shù)據(jù)的概率分布 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，一個是生成數(shù)據(jù)的概率分布。另外，生成器的輸入隨機變量，滿足標準概率分布（高斯、均勻分布）。

圖3. 由保角變換（conformal mapping）誘導的圓盤上概率測度。

概率測度可以看成是一種推廣的面積（或者體積）。我們可以用幾何變換隨意構(gòu)造一個概率測度。如圖3所示，我們用三維掃描儀獲取一張人臉曲面，那么人臉曲面上的面積就是一個概率測度。我們縮放變換人臉曲面，使得總曲面等于 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 。然后，我們用保角變換將人臉曲面映射到平面圓盤。如圖3所示，保角變換將人臉曲面上的無窮小圓映到平面上的無窮小圓，但是，小圓的面積發(fā)生了變化。每對小圓的面積比率定義了平面圓盤上的概率密度函數(shù)。

我們可以將以上的描述嚴格化。人臉曲面記為 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，其上具有黎曼度量。平面圓盤記為，平面坐標為，平面的歐氏度量為 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 。保角映射記為

則 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，這里面積變換率函數(shù)給出了概率密度函數(shù)。誘導了圓盤 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 上的一個概率測度。

圖4. 兩個概率測度之間的最優(yōu)傳輸映射。

最優(yōu)傳輸映射 圓盤上本來有均勻分布 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，又有保角變換誘導的概率分布，則存在唯一的最優(yōu)傳輸映射 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 。圖4顯示了這個映射，中間幀到右?guī)挠成渚褪亲顑?yōu)傳輸映射。我們看到，鼻尖周圍的區(qū)域被壓縮，概率密度提高。

圖5. 離散最優(yōu)傳輸。

離散最優(yōu)傳輸映射 最優(yōu)傳輸映射的數(shù)值計算非常幾何化，因此可以直接被可視化。我們將目標概率測度離散化，表示成一族離散點， This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ；每點被賦予一個狄拉克測度，，滿足。然后，我們求得單位圓盤的一個胞腔分解， This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，每個胞腔映到相應(yīng)的目標點，。映射保持概率測度，胞腔的面積等于目標測度，

同時極小化傳輸代價，

圖6. 離散Brenier勢能函數(shù)，離散最優(yōu)傳輸映射。

離散Brenier勢能 離散最優(yōu)傳輸映射是離散Brenier勢能函數(shù)的梯度映射。對于每一個目標離散點 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，我們構(gòu)造一個平面，這里平面的截距是未知變量。這些平面的上包絡(luò)（upper envelope）構(gòu)成一個開放的凸多面體，恰為離散Brenier勢能函數(shù) This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的圖（Graph）,

。

圖6左側(cè)顯示了離散Briener勢能函數(shù)。凸多面體在平面上的投影構(gòu)成了平面的胞腔分解，凸多面體的每個面 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 被映成了一個胞腔；每個面的梯度都是，因此Brenier勢能函數(shù)的梯度映射就是 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 。

根據(jù)保測度性質(zhì)，每個胞腔 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的面積應(yīng)該等于指定面積。由此，我們調(diào)節(jié)平面的截距 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 以滿足這個限制。根據(jù)亞歷山大定理，這種截距存在，并且本質(zhì)上唯一。

離散Wasserstein距離 我們和丘成桐先生建立了變分法來求取平面的截距 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 。給定截距向量，平面族為，其上包絡(luò)構(gòu)成的Briener勢能函數(shù)為 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. , 上包絡(luò)的投影生成了平面的胞腔分解, 胞腔的面積記為。我們定義的能量為，

這個能量在子空間 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 上是嚴格凹的，其唯一的全局最大點就給出了滿足保測度條件的截距。這個能量的非線性項，實際上是上包絡(luò)截出的柱體體積，

圖7給出了柱體體積的可視化，柱體體積 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 是凸函數(shù)。

圖7. 離散Brenier勢能函數(shù)的圖截出的柱體體積 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 。

體積函數(shù) This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 和Wasserstein距離之間相差一個勒讓德變換（Legendre Transformation）。勒讓德變換非常幾何化，我們可以將其可視化。給定一個定義在實數(shù)軸上的二階光滑凸函數(shù) This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，其圖是一條凸曲線，這條凸曲線由其所有的切線包絡(luò)而成。如果，在任意一點 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，函數(shù)的切線的斜率為y，則此切線的截距滿足

這被稱為是函數(shù) This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的勒讓德變換。以切線的斜率為參數(shù)，以切線的截距為函數(shù)值。

圖8.凸函數(shù)的圖像由其切線包絡(luò)而成，切線集合被表示成原函數(shù)的勒讓德對偶。

因為 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的凸性，映射是微分同胚，記為。那么，原函數(shù)和勒讓德變換后的函數(shù)滿足關(guān)系：

這里c,d是常數(shù)。原函數(shù)和其勒讓德變換的直觀圖解由圖9給出。我們在xy-平面上畫出曲線 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，曲線下面的面積是，曲線上面的面積是勒讓德變換。

圖9. 圖解勒讓德變換。

勒讓德變換的幾何圖景對任意維都對。我們下面來考察體積函數(shù) This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 的勒讓德變換。根據(jù)定義，

假如我們變動截距 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，或者等價地變動胞腔面積，考察兩個胞腔交界處，

p本來屬于 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，變化后屬于，所有這種點的總面積為。則為Wasserstein距離帶來的變化是：

因此，總的Wasserstein距離的變化是

由此我們看到Wasserstein距離等于

其非線性部分是柱體積的勒讓德變換。

總結(jié)

通過以上討論，我們看到給定兩個概率分布 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. ，則存在唯一的一個凸函數(shù)（Brenier 勢函數(shù)），其梯度映射 This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program. 把一個概率分布映成了另外一個概率分布。這個最優(yōu)傳輸映射的傳輸代價就給出了兩個概率分布之間的Wasserstein距離。Brenier勢能函數(shù)，Wasserstein距離都有明晰的幾何解釋。

在Wasserstein-GAN模型中，通常生成器和判別器是用深度神經(jīng)網(wǎng)絡(luò)來實現(xiàn)的。根據(jù)最優(yōu)傳輸理論，我們可以用Briener勢函數(shù)來代替深度神經(jīng)網(wǎng)絡(luò)這個黑箱，從而使得整個系統(tǒng)變得透明。在另一層面上，深度神經(jīng)網(wǎng)絡(luò)本質(zhì)上是在訓練概率分布間的傳輸映射，因此有可能隱含地在學習最優(yōu)傳輸映射，或者等價地Brenier勢能函數(shù)。對這些問題的深入了解，將有助于我們看穿黑箱。

圖10. 基于二維最優(yōu)傳輸映射計算的曲面保面積參數(shù)化（area preserving parameterization），蘇政宇作。

圖11. 基于三維最優(yōu)傳輸映射計算的保體積參數(shù)化（volume preserving parameterization），蘇科華作。

（在2016年，老顧撰寫了多篇有關(guān)最優(yōu)傳輸映射的博文，非常欣慰地看到這些文章啟發(fā)了一些有心的學者，發(fā)表了SIGGRAPH論文，申請了NSF基金。感謝大家關(guān)注老顧談幾何，希望繼續(xù)給大家靈感。）

參考資料

[1]Arjovsky, M. & Bottou, L.eon (2017) Towards Principled Methods for Training Generative Adversarial Networks

[2] Arjovsky, M., Soumith, C. & Bottou, L.eon (2017) Wasserstein GAN.

[3] Xianfeng Gu, Feng Luo, Jian Sun and Shing-Tung Yau, Variational Principles forMinkowski Type Problems, Discrete Optimal Transport, and Discrete Monge-Ampere
Equations, Vol. 20, No. 2, pp. 383-398, Asian Journal of Mathematics (AJM), April 2016.