85
" What the field needed, he argued, was what he called inverse reinforcement learning. Rather than asking, as regular reinforcement learning does, “Given a reward signal, what behavior will optimize it?,” inverse reinforcement learning (or “IRL”) asks the reverse: “Given the observed behaviour, what reward signal, if any, is being optimized?”15 This is, of course, in more informal terms, one of the foundational questions of human life. What exactly do they think they’re doing? We spend a good fraction of our life’s brainpower answering questions like this. We watch the behavior of others around us—friend and foe, superior and subordinate, collaborator and competitor—and try to read through their visible actions to their invisible intentions and goals. It is in some ways the cornerstone of human cognition. It also turns out to be one of the seminal and critical projects in twenty-first-century AI. "
― Brian Christian , The Alignment Problem: Machine Learning and Human Values
97
" In fact, money in general is a domain full of power laws. Power-law distributions characterize both people’s wealth and people’s incomes. The mean income in America, for instance, is $55,688—but because income is roughly power-law distributed, we know, again, that many more people will be below this mean than above it, while those who are above might be practically off the charts. So it is: two-thirds of the US population make less than the mean income, but the top 1% make almost ten times the mean. And the top 1% of the 1% make ten times more than that. "
― Brian Christian , Algorithms to Live By: The Computer Science of Human Decisions
99
" There are many ways to relax a problem, and we’ve seen three of the most important. The first, Constraint Relaxation, simply removes some constraints altogether and makes progress on a looser form of the problem before coming back to reality. The second, Continuous Relaxation, turns discrete or binary choices into continua: when deciding between iced tea and lemonade, first imagine a 50–50 “Arnold Palmer” blend and then round it up or down. The third, Lagrangian Relaxation, turns impossibilities into mere penalties, teaching the art of bending the rules (or breaking them and accepting the consequences) "
― Brian Christian , Algorithms to Live By: The Computer Science of Human Decisions