Welcome
Hi, welcome to my blog😊
Hi, welcome to my blog😊
This note is based on MIT 18.06 📒
This note is based on MIT 18.06 📒
This note is based on MIT 18.06 📒
This note is based on MIT 18.06 📒
Reinforcement Learning (RL) algorithms for LLM alignment with human preferences: RL from Human Feedback (RLHF) and Directed Preference Optimization (DPO).
This note is based on MIT 18.06 📒
RL foundations and Proximal Policy Optimization (PPO) Algorithm
Based on PPO by RethinkFun📒
This note is based on MIT 18.06 📒
Python支持类的多继承,通过super()方法实现对不同父类的访问。