SAC Boeing 747 Pitch Control (ImprovedB747Env)

This model is a Soft Actor-Critic (SAC) agent trained to control the pitch channel of a Boeing 747 in the tensoraerospace.envs.b747.ImprovedB747Env environment. The agent tracks a reference pitch profile while minimizing control effort and promoting smoothness.

Model Details

Developed by: TensorAeroSpace
Shared by: TensorAeroSpace
Model type: Reinforcement Learning — Soft Actor-Critic (continuous control)
Environment: tensoraerospace.envs.b747.ImprovedB747Env
Action space: normalized [-1, 1] (mapped to stabilizer angle ±25 deg)
Observation: [norm_pitch_error, norm_q, norm_theta, norm_prev_action]
License: MIT
Finetuned from: Trained from scratch

Sources

Repository: https://github.com/tensoraerospace/tensoraerospace
Docs: https://tensoraerospace.readthedocs.io/

Uses

Direct Use

Use the pretrained policy for simulation of pitch tracking tasks in the provided environment. Suitable for research and demonstration of RL-based flight control.

Out-of-Scope Use

Real aircraft control or safety-critical deployment without rigorous certification.
Environments and state/action definitions that differ from ImprovedB747Env.

How to Get Started

Install

pip install tensoraerospace

Load the Agent Locally

from tensoraerospace.agent.sac import SAC

agent = SAC.from_pretrained(
    "./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
    load_gradients=False,  # set True to resume training with optimizer states
)

# Evaluate
obs, info = agent.env.reset()
done = False
while not done:
    action = agent.select_action(obs, evaluate=True)
    obs, reward, terminated, truncated, info = agent.env.step(action)
    done = terminated or truncated

Continue Training from Checkpoint

from tensoraerospace.agent.sac import SAC

agent = SAC.from_pretrained(
    "./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
    load_gradients=True,
)

agent.train(num_episodes=10)
agent.save("./runs", save_gradients=True)

Training Details

The saved config.json contains the exact environment and policy parameters used for training. Key entries:

env.name: tensoraerospace.envs.b747.ImprovedB747Env
env.params:
- initial_state: [0, 0, 0, 0]
- reference_signal: shape (1, 201) sinusoidal-like target for pitch
- number_time_steps: 201
policy.params:
- gamma: 0.99
- tau: 0.02
- alpha: auto via automatic entropy tuning
- batch_size: 256
- updates_per_step: 2
- target_update_interval: 1
- lr: 3e-4
- policy_type: Gaussian
- device: cpu

Note: With automatic_entropy_tuning=True, log_alpha and alpha_optim state are saved and can be restored.

Evaluation

The agent was validated in simulation on the same environment by tracking the provided reference pitch signal over 201 steps. Reward aligns with negative quadratic costs on tracking error, pitch rate, control magnitude, smoothness, and jerk.

Bias, Risks, and Limitations

Simulation fidelity limits real-world applicability.
Trained on a specific reference and time horizon; generalization requires retraining.
Safety constraints are implicit via reward shaping and bounds; not certified for real flight.

Environmental Impact

Training performed on CPU for this checkpoint. For large-scale training, estimate CO2eq with the ML CO2 Impact calculator.

Technical Specs

Algorithm: Soft Actor-Critic
Networks: MLP policy and twin Q-networks (hidden size: 256 by default)
Frameworks: PyTorch, Gymnasium

Citation

If you use this model, please cite the TensorAeroSpace repository.

@misc{tensoraerospace,
  title        = {TensorAeroSpace: Aerospace Simulation and RL Framework},
  author       = {TensorAeroSpace contributors},
  year         = {2023},
  howpublished = {\url{https://github.com/tensoraerospace/tensoraerospace}},
}

Model Card Authors

TensorAeroSpace Team

Contact

For questions, please open an issue at the repository or email [email protected].

Downloads last month: 8

Video Preview

Reinforcement Learning

Evaluation results

Metadata error: specify a dataset to view leaderboard