Effectively deterring humans is a bit different than effectively deterring an idealized game theoretic agent. If you're a skilled elephant jockey, you can remove a good amount of this discrepancy, but you need to know how it works so that you can correct for it.
If we look at what actually motivates our actions, we find that rewards and punishments (including those from simulations) are the driving factor. Basically, if it's not near mode, it's not motivating. And since people use near mode more for near term things, this gives us hyperbolic discounting - people act as if they value immediate things hyperbolically more than future things. This really blows up for instant feedback, so you can get effective deterrence with minimal utility loss if you do it right away.
Aversion is strongly associated with fear and dread. Think of something that is hard to think about, or that surge of terror from the approaching lion - or the dread of waking up at 5 am to go to work - where your brain is just screaming "No! No! No no no!" - that is the feeling of aversion.
Not all 'negative' emotions deter. Interestingly enough, sadness doesn't seem to do it. Dreading sadness might, but sadness itself doesn't seem to. Empirically, people that are bumming out for extended periods of time don't act extraordinarily motivated to stop it. However, people that are absorbed in fear will do whatever they can to make sure that they stop feeling it - even if they have to develop a phobia to do it.
We want to maximize the feeling of aversion while minimizing actual loss of utility. This means you want things that demand near mode thought. Encourage vivid imagination by giving vivid descriptions. Make it instant. Make it scary. If you want it to stick in absence of punishment, make it intermittent. But we don't want it to actually destroy value. So make it brief. Generate scary without harm. Even make a game about it - deterrence works even when you're having fun. One of my preferred methods of deterrence (from both sides) is to playfully say "No!" as if you're talking to a misbehaving puppy. It's mild and playful enough to not burn any utility or derail the conversation, but still effective enough to stop bad conversational habits!
Electric shocks are perfect on all fronts. Instant, scary, and harmless. Makes me want to put on a shocker collar and hand the remote to a someone that is good at spotting self deception...
The main failure of attempts to deter behavior is that the wrong thing gets conditioned against. If you're thinking "Oh god, why am I shocking myself?!" then you're conditioning against shocking yourself, and your incentive scheme itself won’t stick. Remember, classical conditioning is simple. Fire together wire together. It's the salient cues that will be associated with the punishment; if the focus is on the punisher, then the conditioned aversion will be tied to the punisher, not the bad deed - even if it's an intrapersonal issue.
If you're punishing people (again, including yourself) with disapproval, you run the risk of them finding it unfair and associating you with the unpleasantness. It's safer and more effective to ask them in a neutral tone if this is something they should be doing - and let them feel that counterfactual dread - and deter themselves. If they won’t do this even after the costs are explained, they're declaring war anyway.
A safer option is to deter people by removing positive attention, but it is more limited. This is a form of "negative punishment", and is safer because it is less likely to be framed as a manipulative attack, and more likely to be framed as simply not giving rewards that weren't earned.
Deterrence is tricky, but not necessarily destructive. Just make sure you're aware of what you're doing.