日本综合久久_特级丰满少妇一级aaaa爱毛片_91在线视频观看_久久999免费视频_99精品热播_黄色片地址

課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 国产一区 在线视频 | 国产精品黄色 | 在线91 | 日韩欧美视频在线 | 国产欧美日韩一区 | 亚洲一二三区免费 | 亚洲一区二区三区在线视频 | 亚洲激情专区 | www.日本国产| 精品一区二区免费视频 | 亚洲国产偷 | avmans最新导航地址 | 羞羞的视频免费看 | 精品国产一级 | 亚洲精品美女视频 | 99精品国产一区二区三区 | 亚洲成人精品国产 | 亚洲最大的黄色网址 | 亚洲人va欧美va人人爽 | 狠狠草视频 | 国产精品成人一区二区三区 | 亚洲免费在线观看 | 狠狠干av | 国产一级片免费视频 | 亚洲一区二区三区在线视频 | 午夜影院视频在线观看 | 精品视频免费 | 国产成人精品一区二区三区 | av黄色在线播放 | 国产精品自产av一区二区三区 | 亚洲精品久久久久国产 | 一本色道精品久久一区二区三区 | 日本久久精品 | 91欧美激情一区二区三区成人 | 日韩1区| 一区二区免费 | 亚洲精视频 | 亚洲第一色站 | 中文字幕高清视频 | 久久99精品久久久久久狂牛 | 91视频在线看 |