SAC Boeing 747 Pitch Control (ImprovedB747Env)

This model is a Soft Actor-Critic (SAC) agent trained to control the pitch channel of a Boeing 747 in the tensoraerospace.envs.b747.ImprovedB747Env environment. The agent tracks a reference pitch profile while minimizing control effort and promoting smoothness.

Model Details

  • Developed by: TensorAeroSpace
  • Shared by: TensorAeroSpace
  • Model type: Reinforcement Learning — Soft Actor-Critic (continuous control)
  • Environment: tensoraerospace.envs.b747.ImprovedB747Env
  • Action space: normalized [-1, 1] (mapped to stabilizer angle ±25 deg)
  • Observation: [norm_pitch_error, norm_q, norm_theta, norm_prev_action]
  • License: MIT
  • Finetuned from: Trained from scratch

Sources

Uses

Direct Use

Use the pretrained policy for simulation of pitch tracking tasks in the provided environment. Suitable for research and demonstration of RL-based flight control.

Out-of-Scope Use

  • Real aircraft control or safety-critical deployment without rigorous certification.
  • Environments and state/action definitions that differ from ImprovedB747Env.

How to Get Started

Install

pip install tensoraerospace

Load the Agent Locally

from tensoraerospace.agent.sac import SAC

agent = SAC.from_pretrained(
    "./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
    load_gradients=False,  # set True to resume training with optimizer states
)

# Evaluate
obs, info = agent.env.reset()
done = False
while not done:
    action = agent.select_action(obs, evaluate=True)
    obs, reward, terminated, truncated, info = agent.env.step(action)
    done = terminated or truncated

Continue Training from Checkpoint

from tensoraerospace.agent.sac import SAC

agent = SAC.from_pretrained(
    "./example/reinforcement_learning/best_episode_200k_episodes_0008_mae/Oct02_11-52-57_SAC/",
    load_gradients=True,
)

agent.train(num_episodes=10)
agent.save("./runs", save_gradients=True)

Training Details

The saved config.json contains the exact environment and policy parameters used for training. Key entries:

  • env.name: tensoraerospace.envs.b747.ImprovedB747Env
  • env.params:
    • initial_state: [0, 0, 0, 0]
    • reference_signal: shape (1, 201) sinusoidal-like target for pitch
    • number_time_steps: 201
  • policy.params:
    • gamma: 0.99
    • tau: 0.02
    • alpha: auto via automatic entropy tuning
    • batch_size: 256
    • updates_per_step: 2
    • target_update_interval: 1
    • lr: 3e-4
    • policy_type: Gaussian
    • device: cpu

Note: With automatic_entropy_tuning=True, log_alpha and alpha_optim state are saved and can be restored.

Evaluation

The agent was validated in simulation on the same environment by tracking the provided reference pitch signal over 201 steps. Reward aligns with negative quadratic costs on tracking error, pitch rate, control magnitude, smoothness, and jerk.

Bias, Risks, and Limitations

  • Simulation fidelity limits real-world applicability.
  • Trained on a specific reference and time horizon; generalization requires retraining.
  • Safety constraints are implicit via reward shaping and bounds; not certified for real flight.

Environmental Impact

Training performed on CPU for this checkpoint. For large-scale training, estimate CO2eq with the ML CO2 Impact calculator.

Technical Specs

  • Algorithm: Soft Actor-Critic
  • Networks: MLP policy and twin Q-networks (hidden size: 256 by default)
  • Frameworks: PyTorch, Gymnasium

Citation

If you use this model, please cite the TensorAeroSpace repository.

@misc{tensoraerospace,
  title        = {TensorAeroSpace: Aerospace Simulation and RL Framework},
  author       = {TensorAeroSpace contributors},
  year         = {2023},
  howpublished = {\url{https://github.com/tensoraerospace/tensoraerospace}},
}

Model Card Authors

TensorAeroSpace Team

Contact

For questions, please open an issue at the repository or email [email protected].

Downloads last month
8
Video Preview
loading