not bad
It does better than most at staying together but it sometimes gets confused on basic things like "right" and "left", male and female body parts, orientation, etc. While it also can have nice prose, it heavily shows ChatGPT-like style/prose. Things like validating the users question while sounding profound, using fancy hype words, and excessive agreement to name a few.
The model reminded me of a story my professor told the class once, if I remember correctly it was about spatial or visual learning and how significant knowing "right" and "left" is... They would put a rat in a square, disorient it and see if it could hit a target. Same with different ages of children and adults. The key takeaway was that both rats and young children, when disoriented, use a more primitive, geometry based navigation strategy. It was only when that child was old enough to know its right from left that it was it able to do better than the rat. So if I were to try to apply this to training large language models. I would go back to the basics. It needs to understand the simple stuff before it can make any profound leap forward with the difficult stuff.
thanks! this might be something we've done wrong with the data on the SFT process. stay tuned for future releases.