I wonder if whoever encoded them, added volume to them or something (w/o using normalisation.) The editor I use allows you to set a % of maximum normalisation. Which I use a lot for video. Most audio apps only have a max norm'n.
Alternatively, the app they used to extract the tracks may have been some cheap nasty softw OR as you suggested, they've recorded them instead of extracting them. Your description has a strong hint of audio clipping. |