SAC Boeing 747 Pitch Control (ImprovedB747Env)
This model is a Soft Actor-Critic (SAC) agent trained to control the pitch channel of a Boeing 747 in the tensoraerospace.envs.b747.ImprovedB747Env environment. The agent tracks a reference pitch profile while minimizing control effort and promoting smoothness.
Model Details
- Developed by: TensorAeroSpace
- Shared by: TensorAeroSpace
- Model type: Reinforcement Learning — Soft Actor-Critic (continuous control)
- Environment:
tensoraerospace.envs.b747.ImprovedB747Env - Action space: normalized [-1, 1] (mapped to stabilizer angle ±25 deg)
- Observation:
[norm_pitch_error, norm_q, norm_theta, norm_prev_action] - License: MIT
- Finetuned from: Trained from scratch
Sources
- Repository: https://github.com/tensoraerospace/tensoraerospace
- Docs: https://tensoraerospace.readthedocs.io/
Uses
Direct Use
Use the pretrained policy for simulation of pitch tracking tasks in the provided environment. Suitable for research and demonstration of RL-based flight control.
Out-of-Scope Use
- Real aircraft control or safety-critical deployment without rigorous certification.
- Environments and state/action definitions that differ from
ImprovedB747Env.
How to Get Started
Install
pip install tensoraerospace
Load the Agent Locally
from tensoraerospace.agent.sac import SAC
agent = SAC.from_pretrained(
"./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
load_gradients=False, # set True to resume training with optimizer states
)
# Evaluate
obs, info = agent.env.reset()
done = False
while not done:
action = agent.select_action(obs, evaluate=True)
obs, reward, terminated, truncated, info = agent.env.step(action)
done = terminated or truncated
Continue Training from Checkpoint
from tensoraerospace.agent.sac import SAC
agent = SAC.from_pretrained(
"./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
load_gradients=True,
)
agent.train(num_episodes=10)
agent.save("./runs", save_gradients=True)
Training Details
The saved config.json contains the exact environment and policy parameters used for training. Key entries:
env.name:tensoraerospace.envs.b747.ImprovedB747Envenv.params:initial_state:[0, 0, 0, 0]reference_signal: shape(1, 201)sinusoidal-like target for pitchnumber_time_steps:201
policy.params:gamma:0.99tau:0.02alpha:autovia automatic entropy tuningbatch_size:256updates_per_step:2target_update_interval:1lr:3e-4policy_type:Gaussiandevice:cpu
Note: With automatic_entropy_tuning=True, log_alpha and alpha_optim state are saved and can be restored.
Evaluation
The agent was validated in simulation on the same environment by tracking the provided reference pitch signal over 201 steps. Reward aligns with negative quadratic costs on tracking error, pitch rate, control magnitude, smoothness, and jerk.
Bias, Risks, and Limitations
- Simulation fidelity limits real-world applicability.
- Trained on a specific reference and time horizon; generalization requires retraining.
- Safety constraints are implicit via reward shaping and bounds; not certified for real flight.
Environmental Impact
Training performed on CPU for this checkpoint. For large-scale training, estimate CO2eq with the ML CO2 Impact calculator.
Technical Specs
- Algorithm: Soft Actor-Critic
- Networks: MLP policy and twin Q-networks (hidden size: 256 by default)
- Frameworks: PyTorch, Gymnasium
Citation
If you use this model, please cite the TensorAeroSpace repository.
@misc{tensoraerospace,
title = {TensorAeroSpace: Aerospace Simulation and RL Framework},
author = {TensorAeroSpace contributors},
year = {2023},
howpublished = {\url{https://github.com/tensoraerospace/tensoraerospace}},
}
Model Card Authors
TensorAeroSpace Team
Contact
For questions, please open an issue at the repository or email [email protected].
- Downloads last month
- 8