斯坦福大學Fall 2018課程-機器學習硬件加速器

LibraryPKU 2018-07-15

展開全文

【導讀】斯坦福大學2018秋季學期推出《機器學習硬件加速器》課程，介紹機器學習系統(tǒng)中的硬件加速器訓練和推理的架構技術，系統(tǒng)而又前沿，是該領域不可多得的課程值得一看。

課程簡介

本課程將深入介紹在機器學習系統(tǒng)中用于設計訓練和推理加速器的架構技術。本課程將涵蓋經典的ML算法，如線性回歸和支持向量機，以及DNN模型，如卷積神經網絡和遞歸神經網絡。我們將考慮對這些模型的訓練和推斷，并討論批量大小、精度、稀疏性和壓縮等參數(shù)對這些模型精度的影響。我們將介紹ML模型推理和訓練的加速器設計。學生將熟悉使用并行性、局部性和低精度來實現(xiàn)ML中使用的核心計算內核的硬件實現(xiàn)技術。為了設計高效節(jié)能的加速器，學生們將建立直覺，在ML模型參數(shù)和硬件實現(xiàn)技術之間進行權衡。學生將閱讀最近的研究論文并完成一個設計項目。

課程地址：

https://cs217./

教師介紹

Kunle Olukotun 教授：

http://arsenalfc./kunle

ARDAVAN PEDRAM

https://web./~perdavan/

課程內容安排

Lecture	Topic	Reading	Spatial Assignment

1	Introduction, role of hardware accelerators in post Dennard and Moore era （硬件加速器在后登納-摩爾時代作用介紹）	Is Dark silicon useful? Hennessy Patterson Chapter 7.1-7.2
2	Classical ML algorithms: Regression, SVMs (What is the building block?) （經典ML算法：回歸，SVMs）	TABLA
3	Linear algebra fundamentals and accelerating linear algebra BLAS operations 20th century techniques: Systolic arrays and MIMDs, CGRAs （線性代數(shù)基礎和BLAS加速運算）	Why Systolic Architectures? Anatomy of high performance GEMM	Linear Algebra Accelerators
4	Evaluating Performance, Energy efficiency, Parallelism, Locality, Memory hierarchy, Roofline model （評價性能、能效、并行度、局部性、內存層次結構,Roofline 模型）	Dark Memory
5	Real-World Architectures: Putting it into practice Accelerating GEMM: Custom, GPU, TPU1 architectures and their GEMM performance （現(xiàn)實世界的架構:將其付諸實踐加速GEMM:自定義、GPU、TPU1架構和它們的GEMM性能。）	Google TPU Codesign Tradeoffs NVIDIA Tesla V100
6	Neural networks: MLPs and CNNs Inference （神經網絡：MLP和CNN推斷）	Viviense IEEE proceeding Brooks’s book (Selected Chapters)	CNN Inference Accelerators
7	Accelerating Inference for CNNs: Blocking and Parallelism in practice DianNao, Eyeriss, TPU1 （加速對CNNs的推理:在實踐中阻塞和并行。 DianNao、Eyeriss TPU1）	Systematic Approach to Blocking Eyeriss Google TPU (see lecture 5)
8	Modeling neural networks with Spatial, Analyzing performance and energy with Spatial （以空間為基礎的神經網絡建模，分析性能和空間能量）	Spatial One related work
9	Training: SGD, back propagation, statistical efficiency, batch size （訓練：SGD，）反向傳播，	NIPS workshop last year Graphcore	Training Accelerators
10	Resilience of DNNs: Sparsity and Low Precision Networks （DNNs的彈性能力:稀疏性和低精度網絡）	Some theory paper EIE Flexpoint of Nervana Boris Ginsburg: paper, presentation LSTM Block Compression by Baidu?
11	Low precision training （低精度訓練）	HALP Ternary or binary networks See Boris Ginsburg's work (lecture 10)
12	Training in Distributed and Parallel systems: Hogwild!, asynchrony and hardware efficiency （分布式并行系統(tǒng)訓練）	Deep Gradient compression Hogwild! Large Scale Distributed Deep Networks Obstinate cache?
13	FPGAs and CGRAs: Catapult, Brainwave, Plasticine （FPGA）	Catapult Brainwave Plasticine
14	ML benchmarks: DAWNbench, MLPerf (機器學習基準)	DawnBench Some other benchmark paper
15	Project presentations

客座講師

課程相關內容Slides

Lecture01: Deep Learning Challenge. Is There Theory? (Donoho/Monajemi/Papyan)
https://cs217./assets/lectures/StanfordStats385-20170927-Lecture01-Donoho.pdf
Lecture02: Overview of Deep Learning From a Practical Point of View (Donoho/Monajemi/Papyan)
https://cs217./assets/lectures/Lecture-02-AsCorrected.pdf
Lecture03: Harmonic Analysis of Deep Convolutional Neural Networks (Helmut Bolcskei)
https://cs217./assets/lectures/bolcskei-stats385-slides.pdf
Lecture04: Convnets from First Principles: Generative Models, Dynamic Programming & EM (Ankit Patel)
https://cs217./assets/lectures/2017%20Stanford%20Guest%20Lecture%20-%20Stats%20385%20-%20Oct%202017.pdf
Lecture05: When Can Deep Networks Avoid the Curse of Dimensionality and Other Theoretical Puzzles (Tomaso Poggio)
https://cs217./assets/lectures/StanfordStats385-20171025-Lecture05-Poggio.pdf
Lecture06: Views of Deep Networksfrom Reproducing Kernel Hilbert Spaces (Zaid Harchaoui)
https://cs217./assets/lectures/lecture6_stats385_stanford_nov17.pdf
Lecture07: Understanding and Improving Deep Learning With Random Matrix Theory (Jeffrey Pennington)
https://cs217./assets/lectures/Understanding_and_improving_deep_learing_with_random_matrix_theory.pdf
Lecture08: Topology and Geometry of Half-Rectified Network Optimization (Joan Bruna)
https://cs217./assets/lectures/stanford_nov15.pdf
Lecture09: What’s Missing from Deep Learning? (Bruno Olshausen)
https://cs217./assets/lectures/lecture-09--20171129.pdf
Lecture10: Convolutional Neural Networks in View of Sparse Coding (Vardan Papyan)
https://cs217./assets/lectures/lecture-10--20171206.pdf

附：第一節(jié) 深度學習挑戰(zhàn)：存在理論么

本站是提供個人知識管理的網絡存儲空間，所有內容均由用戶發(fā)布，不代表本站觀點。請注意甄別內容中的聯(lián)系方式、誘導購買等信息，謹防詐騙。如發(fā)現(xiàn)有害或侵權內容，請點擊一鍵舉報。

轉藏分享

QQ空間 QQ好友新浪微博微信

獻花（0） +1

來自： LibraryPKU > 《機器學習》

舉報/認領

0條評論

發(fā)表

請遵守用戶評論公約

類似文章 更多

LibraryPKU

關注對話

TA的最新館藏

Geoserver 以及 Geotools各版本和jdk版本對照表
PostGis 與Posgresql 版本對應
No module named ‘config‘，一招解決無法連接到上一級文件
python - 使 argparse 對待破折號和下劃線相同
CentOS 7 軟件安裝 —— 用 alternatives 命令安裝多個版本的 JDK
python 項目自動生成環(huán)境配置文件requirements.txt

喜歡該文的人也喜歡更多

熱門閱讀換一換

一区二区三区日韩精品-日韩经典一区二区三区-五月激情综合丁香婷婷-欧美精品中文字幕专区

斯坦福大學Fall 2018課程-機器學習硬件加速器

Kunle Olukotun 教授：

Lecture01: Deep Learning Challenge. Is There Theory? (Donoho/Monajemi/Papyan)

https://cs217./assets/lectures/StanfordStats385-20170927-Lecture01-Donoho.pdf

Lecture02: Overview of Deep Learning From a Practical Point of View (Donoho/Monajemi/Papyan)

Lecture03: Harmonic Analysis of Deep Convolutional Neural Networks (Helmut Bolcskei)

Lecture04: Convnets from First Principles: Generative Models, Dynamic Programming & EM (Ankit Patel)

Lecture05: When Can Deep Networks Avoid the Curse of Dimensionality and Other Theoretical Puzzles (Tomaso Poggio)

Lecture06: Views of Deep Networksfrom Reproducing Kernel Hilbert Spaces (Zaid Harchaoui)

Lecture07: Understanding and Improving Deep Learning With Random Matrix Theory (Jeffrey Pennington)

Lecture08: Topology and Geometry of Half-Rectified Network Optimization (Joan Bruna)

Lecture09: What’s Missing from Deep Learning? (Bruno Olshausen)

Lecture10: Convolutional Neural Networks in View of Sparse Coding (Vardan Papyan)