microsoft · you-n-g · Nov 10, 2022 · Oct 19, 2022 · Oct 19, 2022 · Oct 20, 2022
diff --git a/docs/component/highfreq.rst b/docs/component/highfreq.rst
@@ -8,33 +8,33 @@ Design of Nested Decision Execution Framework for High-Frequency Trading
 Introduction
 ============
 
-Daily trading (e.g. portfolio management) and intraday trading (e.g. orders execution) are two hot topics in Quant investment and usually studied separately.
+Daily trading (e.g. portfolio management) and intraday trading (e.g. orders execution) are two hot topics in Quant investment and are usually studied separately.
 
 To get the join trading performance of daily and intraday trading, they must interact with each other and run backtest jointly.
-In order to support the joint backtest strategies in multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which make the backtesting aforementioned inaccurate.
+In order to support the joint backtest strategies at multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which makes the backtesting aforementioned inaccurate.
 
 Besides backtesting, the optimization of strategies from different levels is not standalone and can be affected by each other.
-For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we improve the order execution strategies).
-To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level. 
+For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may become a better choice when we improve the order execution strategies).
+To achieve overall good performance, it is necessary to consider the interaction of strategies at a different levels.
 
-Therefore, building a new framework for trading in multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that consider the interaction of strategies.
+Therefore, building a new framework for trading on multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that considers the interaction of strategies.
 
 .. image:: ../_static/img/framework.svg
 
 The design of the framework is shown in the yellow part in the middle of the figure above. Each level consists of ``Trading Agent`` and ``Execution Env``. ``Trading Agent`` has its own data processing module (``Information Extractor``), forecasting module (``Forecast Model``) and decision generator (``Decision Generator``). The trading algorithm generates the decisions by the ``Decision Generator`` based on the forecast signals output by the ``Forecast Module``, and the decisions generated by the trading algorithm are passed to the ``Execution Env``, which returns the execution results.
 
-The frequency of trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of trading algorithm. 
+The frequency of the trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of the nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of the trading algorithm.
 
-The optimization for the nested decision execution framework can be implemented with an RL-based method, which can be supported by `qlib.rl<https://github.com/microsoft/qlib/tree/main/examples/rl>`_.
+The optimization for the nested decision execution framework can be implemented with the support of QlibRL. To know more about how to use the QlibRL, go to API Reference: `RL API <../reference/api.html#rl>`_. 
 
 Example
 =======
 
-An example of nested decision execution framework for high-frequency can be found `here <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution/workflow.py>`_.
+An example of a nested decision execution framework for high-frequency can be found `here <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution/workflow.py>`_.
 
 
-Besides, the above examples, here are some other related work about high-frequency trading in Qlib.
+Besides, the above examples, here are some other related works about high-frequency trading in Qlib.
 
 - `Prediction with high-frequency data <https://github.com/microsoft/qlib/tree/main/examples/highfreq#benchmarks-performance-predicting-the-price-trend-in-high-frequency-data>`_
-- `Examples <https://github.com/microsoft/qlib/blob/main/examples/orderbook_data/>`_ to extract features form high-frequency data without fixed frequency.
+- `Examples <https://github.com/microsoft/qlib/blob/main/examples/orderbook_data/>`_ to extract features from high-frequency data without fixed frequency.
 - `A paper <https://github.com/microsoft/qlib/tree/high-freq-execution#high-frequency-execution>`_ for high-frequency trading.
diff --git a/docs/component/rl.rst b/docs/component/rl.rst
@@ -19,7 +19,7 @@ Base Modules
 
 EnvWrapper
 ------------
-EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy / strategy / agent), simulates the changes of the market, and then replies rewards and updated states, thus forming an interaction loop.
+EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy / strategy / agent), simulates the changes in the market, and then replies rewards and updated states, thus forming an interaction loop.
 
 In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
 
@@ -32,7 +32,7 @@ In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary i
 - `Reward function` 
     The reward function returns a numerical reward to the policy after each time the policy takes an action. 
 
-EnvWrapper will organically organize these components. Such decomposition allows for better flexibility in development. For example, if the developers want to train multiple types of policies in one same environment, they only need to design one simulator, and design different state interpreters / action interpreters / reward functions for a different types of policies.
+EnvWrapper will organically organize these components. Such decomposition allows for better flexibility in development. For example, if the developers want to train multiple types of policies in the same environment, they only need to design one simulator, and design different state interpreters / action interpreters / reward functions for different types of policies.
 
 QlibRL has well-defined base classes for all these 4 components. All the developers need to do is define their own components by inheriting the base classes and then implementing all interfaces required by the base classes.
 
@@ -60,15 +60,15 @@ Order Execution
 ------------
 As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Essentially, the goal of order execution is twofold: it not only requires to fulfill the whole order but also targets a more economical execution with maximizing profit gain (or minimizing capital loss). The order execution with only one order of liquidation or acquirement is called single-asset order execution.
 
-Considering stock investment always aim to pursue long-term maximized profits, is usually manifests as a sequential process of continuously adjusting the asset portfolios, execution for multiple orders, including order of liquidation and acquirement, brings more constraints and making the sequence of execution for different orders should be considered, e.g. before executing an order to buy some stocks, we have to sell at least one stock. The order execution with multiple assets is called multi-asset order execution. 
+Considering stock investment always aim to pursue long-term maximized profits, it usually manifests as a sequential process of continuously adjusting the asset portfolios, execution for multiple orders, including order of liquidation and acquirement, brings more constraints and makes the sequence of execution for different orders should be considered, e.g. before executing an order to buy some stocks, we have to sell at least one stock. The order execution with multiple assets is called multi-asset order execution. 
 
-According to the order execution’s trait of sequential decision making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy through interacting with the market environment. 
+According to the order execution’s trait of sequential decision-making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy by interacting with the market environment. 
 
 With QlibRL, the RL algorithm in the above scenarios can be easily implemented.
 
 Nested Portfolio Construction and Order Executor
 ------------
-QlibRL make it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution>`_ an example of, optimization of order execution strategy and portfolio management strategy can interact with each other to maximize returns.
+QlibRL makes it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework <https://github.com/microsoft/qlib/blob/main/examples/nested_decision_execution>`_ as an example of, the optimization of order execution strategy and portfolio management strategy can interact with each other to maximize returns.
 
 Base Class & Interface 
 ============
@@ -99,7 +99,7 @@ If developers have already defined their simulator / interpreters / reward funct
         policy=policy,  
         reward=PAPenaltyReward(),  
         vessel_kwargs={
-            "episode_per_iter": 100, 6
+            "episode_per_iter": 100,
             "update_kwargs": {
                 "batch_size": 64, 
                 "repeat": 5,