时序差分学习：修订间差异

维基百科，自由的百科全书

删除的内容添加的内容

行内

2023年4月4日 (二) 13:51的版本

机器学习与数据挖掘

范式监督学习無監督學習線上機器學習元学习（英语：Meta-learning (computer science)）半监督学习自监督学习强化学习基于规则的机器学习（英语：Rule-based machine learning）量子機器學習
问题统计分类生成模型迴歸分析聚类分析降维密度估计（英语：density estimation）异常检测数据清洗自动机器学习关联规则学习語意分析结构预测（英语：Structured prediction）特征工程表征学习排序学习（英语：Learning to rank）语法归纳（英语：Grammar induction）本体学习（英语：Ontology learning）多模态学习（英语：Multimodal learning）
监督学习 (分类 · 回归) 学徒学习（英语：Apprenticeship learning）决策树学习集成学习 Bagging 提升方法随机森林 k-NN 線性回歸朴素贝叶斯人工神经网络邏輯斯諦迴歸感知器相关向量机（RVM）支持向量机（SVM）迁移学习微调
聚类分析 BIRCH CURE算法（英语：CURE algorithm）层次 k-平均 Fuzzy 期望最大化（EM） DBSCAN OPTICS 均值飘移（英语：Mean shift）
降维因素分析 CCA ICA LDA NMF（英语：Non-negative matrix factorization） PCA PGD（英语：Proper generalized decomposition） t-SNE（英语：t-distributed stochastic neighbor embedding） SDL
结构预测（英语：Structured prediction）圖模式貝氏網路條件隨機域隐马尔可夫模型
异常检测 RANSAC k-NN 局部异常因子（英语：Local outlier factor）孤立森林（英语：Isolation forest）
人工神经网络自编码器認知計算深度学习 DeepDream（英语：DeepDream）多层感知器 RNN LSTM GRU（英语：Gated recurrent unit） ESN（英语：Echo state network）储备池计算（英语：reservoir computing）受限玻尔兹曼机 GAN SOM CNN U-Net Transformer Vision transforme（英语：Vision transformer）脉冲神经网络（英语：Spiking neural network） Memtransistor（英语：Memtransistor）电化学RAM（英语：Electrochemical RAM）（ECRAM）
强化学习 Q学习 SARSA 时序差分（TD）多智能体（英语：Multi-agent reinforcement learning） Self-play（英语：Self-play (reinforcement learning technique)） RLHF
与人类学习主动学习（英语：Active learning (machine learning)）众包 Human-in-the-loop（英语：Human-in-the-loop）
模型诊断学习曲线（英语：Learning curve (machine learning)）
数学基础内核机器（英语：Kernel machines）偏差–方差困境（英语：Bias–variance tradeoff）计算学习理论（英语：Computational learning theory）经验风险最小化奥卡姆学习（英语：Occam learning） PAC学习（英语：Probably approximately correct learning）统计学习 VC理论
大会与出版物 NeurIPS ICML（英语：International Conference on Machine Learning） ICLR ML（英语：Machine Learning (journal)） JMLR（英语：Journal of Machine Learning Research）
相关条目人工智能术语（英语：Glossary of artificial intelligence）机器学习研究数据集列表（英语：List of datasets for machine-learning research）机器学习概要（英语：Outline of machine learning）
查论编

时序差分学习（英語：Temporal difference learning，TD learning）是一类无模型强化学习方法的统称，这种方法强调通过从当前价值函数的估值中自举的方式进行学习。这一方法需要像蒙特卡罗方法那样对环境进行取样，并根据当前估值对价值函数进行更新，宛如动态规划算法。^[1]

和蒙特卡罗法所不同的是，时序差分学习可以在最终结果出来前对其参数进行调整，使其预测更为准确，而蒙特卡罗法只能在最终结果产生后进行调整。^[2]

参考文献

^ Sutton & Barto (2018)
^ Richard Sutton. Learning to predict by the methods of temporal differences. Machine Learning. 1988, 3 (1): 9–44. doi:10.1007/BF00115009 . (A revised version is available on Richard Sutton's publication page 互联网档案馆的存檔，存档日期2017-03-30.)

参考著作

Sutton, Richard S.; Barto, Andrew G. Reinforcement Learning: An Introduction 2nd. Cambridge, MA: MIT Press. 2018.
Tesauro, Gerald. Temporal Difference Learning and TD-Gammon. Communications of the ACM. March 1995, 38 (3): 58–68. S2CID 6023746. doi:10.1145/203330.203343.

取自“https://zh.wikipedia.org/w/index.php?title=时序差分学习&oldid=76658323”

分类：

隐藏分类：

@@ 第1行： / 第1行： @@
 {{机器学习导航栏}}
 '''时序差分学习'''（{{lang-en|Temporal difference learning}}，'''TD learning'''）是一类无模型[[强化学习]]方法的统称，这种方法强调通过从当前价值函数的估值中自举的方式进行学习。这一方法需要像[[蒙特卡罗方法]]那样对环境进行取样，并根据当前估值对价值函数进行更新，宛如[[动态规划]]算法。<ref name="RSutton-2018">{{harvp|Sutton|Barto|2018}}</ref>
+和蒙特卡罗法所不同的是，时序差分学习可以在最终结果出来前对其参数进行调整，使其预测更为准确，而蒙特卡罗法只能在最终结果产生后进行调整。<ref name="RSutton-1988">{{cite journal |author=Richard Sutton |title=Learning to predict by the methods of temporal differences |journal=Machine Learning |volume=3 |issue=1 |pages=9–44 |year=1988 |doi=10.1007/BF00115009|doi-access=free }} (A revised version is available on [http://incompleteideas.net/sutton/publications.html Richard Sutton's publication page] {{Webarchive|url=https://web.archive.org/web/20170330002227/http://incompleteideas.net/sutton/publications.html |date=2017-03-30 }})</ref>
 == 参考文献 ==
 {{reflist}}