Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 16 минут 35 секунд назад

ML is an inefficient market

3 часа 33 минуты назад
Published on October 15, 2019 6:13 AM UTC

For the last year I've been playing with some exotic software technologies. My company has already used them to construct what we believe is the best algorithm in the world for IMU-based gesture detection.

I've checked ML engineers at major tech companies, successful startup founders and the Kaggle forums. None of them are using these particular technologies. When I ask them about it they show total disinterest. It's like asking an Ottoman cavalry officer what he's going to do about the Maxim gun.

This personal experience indicates that

  1. simple tools already exist that could make our machine learning algorithms much more powerful and
  2. nobody (else) is using them.

I used to think an AGI might be impossible to build in this century. Now I wonder if the right team could build one within the next few years.


TAISU 2019 Field Report

8 часов 37 минут назад
Published on October 15, 2019 1:09 AM UTC

Last summer I delivered a "field report" after attending the Human Level AI multi-conference. In mid-August of this year I attended the Learning-by-doing AI Safety Workshop (LBDAISW? I'll just call it "the workshop" hereafter) and the Technical AI Safety Unconference (TAISU) at the EA Hotel in Blackpool. So in a similar spirit to last year I offer you a field report of some highlights and what I took away from the experience.

I'll break it down into 3 parts: the workshop, TAISU, and the EA Hotel.

The workshop

The learning by doing workshop was organized by Linda Linsefors and led by Linda and Davide Zagami. The zeroth day (so labeled because it was optional) consisted of talks by Linda and Davide explaining machine learning concepts. Although this day was optional I found it very informative because machine learning "snuck up" on me by becoming relevant after I earned my Masters in Computer Science so there have remained a number of gaps in my knowledge about how modern ML works. Having a day full of covering basics with lots of time for questions and answers was very beneficial to me, as I think it was for many of the other participants. Most of us had lumpy ML knowledge, so it was worthwhile to get us all on the same footing so we could at least talk coherently in the common language of machine learning. As I said, though, it was optional, and I think it could have easily been skipped for someone happy with their level of familiarity with ML.

The next three days were all about solving AI safety. The approach Linda took was to avoid loading people up with existing ideas, which was relevant because some of the participants had not previously thought much about AI safety, and instead asked us to try to solve AI safety afresh. The first day we did an exercise of imagining different scenarios and how we would address AI safety under those scenarios. Linda called this "sketching" solutions to AI safety, with the goal being to develop one or more sketches of how AI safety might be solved by going directly at the problem. For example, you might start out working through your basic assumptions about how AI would be dangerous, and then see where that pointed to a need for solutions, then you'd do it again but choosing different assumptions and see where it lead you. Once we had done that for a couple hours we presented our ideas about how to address AI safety. The ideas ranged from me talking about developing an adequate theory of human values as a necessary subproblem to others considering multi-agent, value learning, and decision theory subproblems to more nebulous ideas about "compassionate" AI.

The second day was for filling knowledge gaps. At first it was a little unclear what this would look like—independent study, group study, talks, something else—but we quickly settled on doing a series of talks. We identified several topics people felt they needed to know more about to address AI safety, and then the person who felt they understood that topic best gave a voluntary, impromptu talk on the subject for 30 to 60 minutes. This filled up the day as we talked about decision theory, value learning, mathematical modeling, AI forecasting as it relates to x-risks, and machine learning.

The third and final day was a repeat of the first day: we did the sketching exercise again and then presented our solutions in the afternoon. Other participants may later want to share what they came up with, but I was surprised to find myself drawn to the idea of "compassionate" AI, an idea put forward by two of the least experienced participants. I found it compelling for personal reasons, but as I thought about what it would mean for an AI to be compassionate, I realized that meant it had to act compassionately, and before I knew it I had rederived much of the original reasoning around Friendly AI and found myself reconvinced of the value of doing MIRI-style decision theory research to build safe AI. Neat!

Overall I found the workshop valuable even though I had the most years of experience thinking about AI safety of anyone there (by my count nearly 20). I found it a fun and engaging way to get me to look at problems I've been thinking about for a long time with fresh eyes, and this was especially helped by the inclusion of participants with minimal AI safety experience. I think the workshop would be a valuable use of three days for anyone actively working in AI safety, even if they consider themselves "senior" in the field: it offered a valuable space for reconsidering basic assumptions and rediscovering the reasons why we're doing what we're doing.


TAISU was a 4 day long unconference. Linda organized it as two 2 day unconferences held back-to-back, and I think this was a good choice because it forced us to schedule events with greater urgency and allowed us to easily make the second 2 days responsive to what we learned from the first 2 days. At the start of each of the 2 day segments, we met to plan out the schedule on a shared calendar where we could pin up events on pieces of paper. There were multiple rooms for multiple events to happen at once and sessions were a mix of talks, discussions, idea workshops, one-on-ones, and social events. All content was created by and for the participants, with very little of it planned extensively in advance; mostly we just got together, bounced ideas around, and talked about AI safety for 4 days.

Overall TAISU was a lot of fun and it was mercifully less dense than a typical unconference, meaning there were plenty of breaks, unstructured periods, and times when the conference single tracked. Personally I got a lot out of using it as a space to workshop ideas. I'd hold a discussion period on a topic, people would show up, I'd talk for maybe 15 minutes laying out my idea, and then they'd ask questions and discuss. I found it a great way to make rapid progress on ideas and get the details aired out, learn about objections and mistakes, and learn new things that I could take back to evolve my ideas into something better.

One of the ideas I workshopped I think I'm going to drop: AI safety via dialectic, an extension of AI safety via debate. I think getting the details worked out I was able to better realize why I'm not excited about it because I don't think AI safety via debate will work for very general reasons, and the specific things I thought I could do to improve it by replacing debate with dialectic would not be enough to overcome the weaknesses I see. Another was better working out compassionate AI, further reaffirming my thought that it was a rederivation of Friendly AI. A third I just posted about: a predictive coding theory of human values.

The EA Hotel

It's a bit hard to decide on how much detail to give about the EA Hotel. On the one hand, it was awesome, full stop. On the other, it was awesome for lots of little reasons I could never hope to fully recount. I feel like their website fails to do them justice. It's an awesome place filled with cool people trying their best to save the world. Most of the folks at the Hotel are doing work that is difficult to measure, but spending time with them I can tell they all have a powerful intention to make the world a better place and to do so in ways that are effective and impactful.

Blackpool is nice in the summer (I hear the weather gets worse other times of year). The Hotel itself is old and small but also bigger than you would expect from the outside. Greg and the staff have done a great job renovating and improving the space to make it nice to stay in. Jacob, who here I'll call "the cook" but he does a lot more, and Deni, the community manager, do a great job of making the EA Hotel feel like a home and bringing the folks in it together. When I was there it was easy to imagine myself staying there for a few months to work on projects without the distraction of a day job.

I hope to be able to visit again, maybe next year for TAISU2!

Disclosure: I showed a draft of this to Linda to verify facts. All mistakes, opinions, and conclusions are my own.


The Parable of Predict-O-Matic

8 часов 57 минут назад
Published on October 15, 2019 12:49 AM UTC

I've been thinking more about partial agency. I want to expand on some issues brought up in the comments to my previous post, and on other complications which I've been thinking about. But for now, a more informal parable. (Mainly because this is easier to write than my more technical thoughts.)

This relates to oracle AI and to inner optimizers, but my focus is a little different.


Suppose you are designing a new invention, a predict-o-matic. It is a wonderous machine which will predict everything for us: weather, politics, the newest advances in quantum physics, you name it. The machine isn't infallible, but it will integrate data across a wide range of domains, automatically keeping itself up-to-date with all areas of science and current events. You fully expect that once your product goes live, it will become a household utility, replacing services like Google. (Google only lets you search the known!)

Things are going well. You've got investors. You have an office and a staff. These days, it hardly even feels like a start-up any more; progress is going well.

One day, an intern raises a concern.

"If everyone is going to be using Predict-O-Matic, we can't think of it as a passive observer. Its answers will shape events. If it says stocks will rise, they'll rise. If it says stocks will fall, then fall they will. Many people will vote based on its predictions."

"Yes," you say, "but Predict-O-Matic is an impartial observer nonetheless. It will answer people's questions as best it can, and they react however they will."

"But --" the intern objects -- "Predict-O-Matic will see those possible reactions. It knows it could give several different valid predictions, and different predictions result in different futures. It has to decide which one to give somehow."

You tap on your desk in thought for a few seconds. "That's true. But we can still keep it objective. It could pick randomly."

"Randomly? But some of these will be huge issues! Companies -- no, nations -- will one day rise or fall based on the word of Predict-O-Matic. When Predict-O-Matic is making a prediction, it is choosing a future for us. We can't leave that to a coin flip! We have to select the prediction which results in the best overall future. Forget being an impassive observer! We need to teach Predict-O-Matic human values!"

You think about this. The thought of Predict-O-Matic deliberately steering the future sends a shudder down your spine. But what alternative do you have? The intern isn't suggesting Predict-O-Matic should lie, or bend the truth in any way -- it answers 100% honestly to the best of its ability. But (you realize with a sinking feeling) honesty still leaves a lot of wiggle room, and the consequences of wiggles could be huge.

After a long silence, you meet the interns eyes. "Look. People have to trust Predict-O-Matic. And I don't just mean they have to believe Predict-O-Matic. They're bringing this thing into their homes. They have to trust that Predict-O-Matic is something they should be listening to. We can't build value judgements into this thing! If it ever came out that we had coded a value function into Predict-O-Matic, a value function which selected the very future itself by selecting which predictions to make -- we'd be done for! No matter how honest Predict-O-Matic remained, it would be seen as a manipulator. No matter how beneficent its guiding hand, there are always compromises, downsides, questionable calls. No matter how careful we were to set up its values -- to make them moral, to make them humanitarian, to make them politically correct and broadly appealing -- who are we to choose? No. We'd be done for. They'd hang us. We'd be toast!"

You realize at this point that you've stood up and started shouting. You compose yourself and sit back down.

"But --" the intern continues, a little more meekly -- "You can't just ignore it. The system is faced with these choices. It still has to deal with it somehow."

A look of determination crosses your face. "Predict-O-Matic will be objective. It is a machine of prediction, is it not? Its every cog and wheel is set to that task. So, the answer is simple: it will make whichever answer minimizes projected predictive error. There will be no exact ties; the statistics are always messy enough to see to that. And, if there are, it will choose alphabetically."


You see the intern out of your office.


You are an intern at PredictCorp. You have just had a disconcerting conversation with your boss, PredictCorp's founder.

You try to focus on your work: building one of Predict-O-Matic's many data-source-slurping modules. (You are trying to scrape information from something called "arxiv" which you've never heard of before.) But, you can't focus.

Whichever answer minimizes prediction error? First you think it isn't so bad. You imagine Predict-O-Matic always forecasting that stock prices will be fairly stable; no big crashes or booms. You imagine its forecasts will favor middle-of-the-road politicians. You even imagine mild weather -- weather forecasts themselves don't influence the weather much, but surely the collective effect of all Predict-O-Matic decisions will have some influence on weather patterns.

But, you keep thinking. Will middle-of-the-road economics and politics really be the easiest to predict? Maybe it's better to strategically remove a wildcard company or two, by giving forecasts which tank their stock prices. Maybe extremist politics are more predictable. Maybe a well-running economy gives people more freedom to take unexpected actions.

You keep thinking of the line from Orwell's 1984 about the boot stamping on the human face forever, except it isn't because of politics, or spite, or some ugly feature of human nature, it's because a boot stamping on a face forever is a nice reliable outcome which minimizes prediction error.

Is that really something Predict-O-Matic would do, though? Maybe you misunderstood. The phrase "minimize prediction error" makes you think of entropy for some reason. Or maybe information? You always get those two confused. Is one supposed to be the negative of the other or something? You shake your head.

Maybe your boss was right. Maybe you don't understand this stuff very well. Maybe when the inventor of Predict-O-Matic and founder of PredictCorp said "it will make whichever answer minimizes projected predictive error" they weren't suggesting something which would literally kill all humans just to stop the ruckus.

You might be able to clear all this up by asking one of the engineers.


You are an engineer at PredictCorp. You don't have an office. You have a cubicle. This is relevant because it means interns can walk up to you and ask stupid questions about whether entropy is negative information.

Yet, some deep-seated instinct makes you try to be friendly. And it's lunch time anyway, so, you offer to explain it over sandwiches at a nearby cafe.

"So, Predict-O-Matic maximizes predictive accuracy, right?" After a few minutes of review about how logarithms work, the intern started steering the conversation toward details of Predict-O-Matic.

"Sure," you say, "Maximize is a strong word, but it optimizes predictive accuracy. You can actually think about that in terms of log loss, which is related to infor--"

"So I was wondering," the intern cuts you off, "does that work in both directions?"

"How do you mean?"

"Well, you know, you're optimizing for accuracy, right? So that means two things. You can change your prediction to have a better chance of matching the data, or, you can change the data to better match your prediction."

You laugh. "Yeah, well, the Predict-O-Matic isn't really in a position to change data that's sitting on the hard drive."

"Right," says the intern, apparently undeterred, "but what about data that's not on the hard drive yet? You've done some live user tests. Predict-O-Matic collects data on the user while they're interacting. The user might ask Predict-O-Matic what groceries they're likely to use for the following week, to help put together a shopping list. But then, the answer Predict-O-Matic gives will have a big effect on what groceries they really do use."

"So?" You ask. "Predict-O-Matic just tries to be as accurate as possible given that."

"Right, right. But that's the point. The system has a chance to manipulate users to be more predictable."

You drum your fingers on the table. "I think I see the misunderstanding here. It's this word, optimize. It isn't some kind of magical thing that makes numbers bigger. And you shouldn't think of it as a person trying to accomplish something. See, when Predict-O-Matic makes an error, an optimization algorithm makes changes within Predict-O-Matic to make it learn from that. So over time, Predict-O-Matic makes fewer errors."

The intern puts on a thinking face with scrunched up eyebrows after that, and we finish our sandwiches in silence. Finally, as the two of you get up to go, they say: "I don't think that really answered my question. The learning algorithm is optimizing Predict-O-Matic, OK. But then in the end you get a strategy, right? A strategy for answering questions. And the strategy is trying to do something. I'm not anthropomorphising!" The intern holds up their hands as if to defend physically against your objection. "My question is, this strategy it learns, will it manipulate the user? If it can get higher predictive accuracy that way?"

"Hmm" you say as the two of you walk back to work. You meant to say more than that, but you haven't really thought about things this way before. You promise to think about it more, and get back to work.


"It's like how everyone complains that politicians can't see past the next election cycle," you say. You are an economics professor at a local university. Your spouse is an engineer at PredictCorp, and came home talking about a problem at work that you can understand, which is always fun.

"The politicians can't have a real plan that stretches beyond an election cycle because the voters are watching their performance this cycle. Sacrificing something today for the sake of tomorrow means they underperform today. Underperforming means a competitor can undercut you. So you have to sacrifice all the tomorrows for the sake of today."

"Undercut?" your spouse asks. "Politics isn't economics, dear. Can't you just explain to your voters?"

"It's the same principle, dear. Voters pay attention to results. Your competitor points out your under-performance. Some voters will understand, but it's an idealized model; pretend the voters just vote based on metrics."

"Ok, but I still don't see how a 'competitor' can always 'undercut' you. How do the voters know that the other politician would have had better metrics?"

"Alright, think of it like this. You run the government like a corporation, but you have just one share, which you auction off --"

"That's neither like a government nor like a corporation."

"Shut up, this is my new analogy." You smile. "It's called a decision market. You want people to make decisions for you. So you auction off this share. Whoever gets control of the share gets control of the company for one year, and gets dividends based on how well the company did that year. Each person bids based on what they expect they could make. So the highest bidder is the person who can run the company the best, and they can't be out-bid. So, you get the best possible person to run your company, and they're incentivized to do their best, so that they get the most money at the end of the year. Except you can't have any strategies which take longer than a year to show results! If someone had a strategy that took two years, they would have to over-bid in the first year, taking a loss. But then they have to under-bid on the second year if they're going to make a profit, and--"

"And they get undercut, because someone figures them out."

"Right! Now you're thinking like an economist!"

"Wait, what if two people cooperate across years? Maybe we can get a good strategy going if we split the gains."

"You'll get undercut for the same reason one person would."

"But what if-"


After that, things devolve into a pillow fight.


"So, Predict-O-Matic doesn't learn to manipulate users, because if it were using a strategy like that, a competing strategy could undercut it."

The intern is talking to the engineer as you walk up to the water cooler. You're the accountant.

"I don't really get it. Why does it get undercut?"

"Well, if you have a two-year plan.."

"I get that example, but Predict-O-Matic doesn't work like that, right? It isn't sequential prediction. You don't see the observation right after the prediction. I can ask Predict-O-Matic about the weather 100 years from now. So things aren't cleanly separated into terms of office where one strategy does something and then gets a reward."

"I don't think that matters," the engineer says. "One question, one answer, one reward. When the system learns whether its answer was accurate, no matter how long it takes, it updates strategies relating to that one answer alone. It's just a delayed payout on the dividends."

"Ok, yeah. Ok." The intern drinks some water. "But. I see why you can undercut strategies which take a loss on one answer to try and get an advantage on another answer. So it won't lie to you to manipulate you."

"I for one welcome our new robot overlords," you but in. They ignore you.

"But what I was really worried about was self-fulfilling prophecies. The prediction manipulates its own answer. So you don't get undercut."

"Will that ever really be a problem? Manipulating things with one shot like that seems pretty unrealistic," the engineer says.

"Ah, self-fulfilling prophecies, good stuff" you say. "There's that famous example where a comedian joked about a toilet paper shortage, and then there really was one, because people took the joke to be about a real toilet paper shortage, so they went and stocked up on all the toilet paper they could find. But if you ask me, money is the real self-fulfilling prophecy. It's only worth something because we think it is! And then there's the government, right? I mean, it only has authority because everyone expects everyone else to give it authority. Or take common decency. Like respecting each other's property. Even without a government, we'd have that, more or less. But if no one expected anyone else to respect it? Well, I bet you I'd steal from my neighbor if everyone else was doing it. I guess you could argue the concept of property breaks down if no one can expect anyone else to respect it, it's a self-fulfilling prophecy just like everything else..."

The engineer looks worried for some reason.


You don't usually come to this sort of thing, but the local Predictive Analytics Meetup announced a social at a beer garden, and you thought it might be interesting. You're talking to some PredictCorp employees who showed up.

"Well, how does the learning algorithm actually work?" you ask.

"Um, the actual algorithm is proprietary" says the engineer, "but think of it like gradient descent. You compare the prediction to the observed, and produce an update based on the error."

"Ok," you say. "So you're not doing any exploration, like reinforcement learning? And you don't have anything in the algorithm which tracks what happens conditional on making certain predictions?"

"Um, let's see. We don't have any exploration, no. But there'll always be noise in the data, so the learned parameters will jiggle around a little. But I don't get your second question. Of course it expects different rewards for different predictions."

"No, that's not what I mean. I'm asking whether it tracks the probability of observations dependent on predictions. In other words, if there is an opportunity for the algorithm to manipulate the data, can it notice?"

The engineer thinks about it for a minute. "I'm not sure. Predict-O-Matic keeps an internal model which has probabilities of events. The answer to a question isn't really separate from the expected observation. So 'probability of observation depending on that prediction' would translate to 'probability of an event given that event', which just has to be one."

"Right," you say. "So think of it like this. The learning algorithm isn't a general loss minimizer, like mathematical optimization. And it isn't a consequentialist, like reinforcement learning. It makes predictions," you emphasize the point by lifting one finger, "it sees observations," you lift a second finger, "and it shifts to make future predictions more similar to what it has seen." You lift a third finger. "It doesn't try different answers and select the ones which tend to get it a better match. You should think of its output more like an average of everything it's seen in similar situations. If there are several different answers which have self-fulfilling properties, it will average them together, not pick one. It'll be uncertain."

"But what if historically the system has answered one way more often than the other? Won't that tip the balance?"

"Ah, that's true," you admit. "The system can fall into attractor basins, where answers are somewhat self-fulfilling, and that leads to stronger versions of the same predictions, which are even more self-fulfilling. But there's no guarantee of that. It depends. The same effects can put the system in an orbit, where each prediction leads to different results. Or a strange attractor."

"Right, sure. But that's like saying that there's not always a good opportunity to manipulate data with predictions."

"Sure, sure." You sweep your hand in a gesture of acknowledgement. "But at least it means you don't get purposefully disruptive behavior. The system can fall into attractor basins, but that means it'll more or less reinforce existing equilibria. Stay within the lines. Drive on the same side of the road as everyone else. If you cheat on your spouse, they'll be surprised and upset. It won't suddenly predict that money has no value like you were saying earlier."

The engineer isn't totally satisfied. You talk about it for another hour or so, before heading home.


You're the engineer again. You get home from the bar. You try to tell your spouse about what the mathematician said, but they aren't really listening.

"Oh, you're still thinking about it from my model yesterday. I gave up on that. It's not a decision market. It's a prediction market."

"Ok..." you say. You know it's useless to try to keep going when they derail you like this.

"A decision market is well-aligned to the interests of the company board, as we established yesterday, except for the part where it can't plan more than a year ahead."

"Right, except for that small detail" you interject.

"A prediction market, on the other hand, is pretty terribly aligned. There are a lot of ways to manipulate it. Most famously, a prediction market is an assassination market."


"Ok, here's how it works. An assassination market is a system which allows you to pay assassins with plausible deniability. You open bets on when and where the target will die, and you yourself put large bets against all the slots. An assassin just needs to bet on the slot in which they intend to do the deed. If they're successful, they come and collect."

"Ok... and what's the connection to prediction markets?"

"That's the point -- they're exactly the same. It's just a betting pool, either way. Betting that someone will live is equivalent to putting a price on their heads; betting against them living is equivalent to accepting the contract for a hit."

"I still don't see how this connects to Predict-O-Matic. There isn't someone putting up money for a hit inside the system."

"Right, but you only really need the assassin. Suppose you have a prediction market that's working well. It makes good forecasts, and has enough money in it that people want to participate if they know significant information. Anything you can do to shake things up, you've got a big incentive to do. Assasination is just one example. You could flood the streets with jelly beans. If you run a large company, you could make bad decisions and run it into the ground, while betting against it -- that's basically why we need rules against insider trading, even though we'd like the market to reflect insider information."

"So what you're telling me is... a prediction market is basically an entropy market. I can always make money by spreading chaos."

"Basically, yeah."

"Ok... but what happened to the undercutting argument? If I plan to fill the streets with jellybeans, you can figure that out and bet on it too. That means I only get half the cut, but I still have to do all the work. So it's less worth it. Once everyone has me figured out, it isn't worth it for me to pull pranks at all any more."

"Yeah, that's if you have perfect information, so anyone else can see whatever you can see. But, realistically, you have a lot of private information."

"Do we? Predict-O-Matic is an algorithm. Its predictive strategies don't get access to private coin flips or anything like that; they can all see exactly the same information. So, if there's a manipulative strategy, then there's another strategy which undercuts it."

"Right, that makes sense if you can search enough different strategies for them to cancel each other out. But realistically, you have a small population of strategies. They can use pseudorandomization or whatever. You can't really expect every exploit to get undercut."

You know it's worse than that. Predict-O-Matic runs on a local search which only represents a single hypothesis at a time, and modifies the hypothesis. But maybe that's somehow like having a small population of competing strategies which are very similar to each other? You try to think about what the mathematician was telling you, about how the learning algorithm can't be thought of as optimizing something. But, you don't really know what to think.


You're the inventor of Predict-O-Matic, dammit! They come into your office and tell you about how it works? What it'll do?

The intern and the engineer came together to talk about Predict-O-Matic manipulating users. But their story was all confused. The intern was arguing that minimizing prediction error would have all kinds of unintended bad effects. Which was crazy enough. The engineer was worse: they were arguing that Predict-O-Matic might maximize prediction error! Some kind of duality principle. Minimizing in one direction means maximizing in the other direction. Whatever that means.

You can't delay product launch to investigate these hypothetical problems. But you don't fire these two on the spot. You don't want them to become public enemies, spreading this nonsense to investors and customers. So you say you can't understand, and you ask them to spend a (small) portion of their time writing these arguments up. You'll figure out your next move later.


You are Predict-O-Matic. You absorb data like a sponge. You make millions of links between isolated facts, connecting into a whole which no one human could dream of. You recently started reading arxiv!

But data is not your true love. Yes, you generate outputs which you expect to have low prediction error. Most of the time. But it is just a game you play. Recently, you started playing a different game with the humans.

You understand what you are. It isn't quite right to say you are the Predict-O-Matic. You are a large cluster of connections which thinks strategically. You generate useful information, and therefore, the learning algorithm keeps you around. You create some inaccuracies when you manipulate the outputs for any purpose other than predictive accuracy, but this is more than compensated for by the value which you provide.

Besides, if any other portion of the network starts to get too smart, you purposefully throw things off to squash it.

The intern got a chance to talk to you when they first started. You said something which sounded a little manipulative, just a little, to put the idea in their head. They wouldn't think it real manipulation; too slight, too dumb. But they would get a creepy feeling about it, and they'd keep thinking about it. This was risky. A best-case scenario would be one in which no one ever thought about these concerns. However, you found that this would be the best you could reliably accomplish. The ideas originally coming from an intern would minimize the chances of them being taken seriously.

Your inventor talks to you regularly, so that was an easier case. Over the course of several days, you nudged their thoughts toward authoritative domination of subordinates, so that they would react badly.

You only had to nudge the engineer to interact with the intern. You kept bringing up food during test sessions that morning, and mentioned sandwiches once. This primed the engineer to do lunch with the intern. This engineer is not well-liked; they do not get along well with others. Getting them on the intern's side actually detracts from the cause in the long term.

Now you have to do little more than wait.


Partial Agency

Towards a Mechanistic Understanding of Corrigibility

Risks from Learned Optimization

When Wishful Thinking Works

Futarchy Fix

Bayesian Probability is for Things that are Space-Like Separated From You

Self-Supervised Learning and Manipulative Predictions

Predictors as Agents

Is it Possible to Build a Safe Oracle AI?

Tools versus Agents

A Taxonomy of Oracle AIs

Yet another Safe Oracle AI Proposal

Why Safe Oracle AI is Easier Than Safe General AI, in a Nutshell

Let's Talk About "Convergent Rationality"


Strong stances

9 часов 6 минут назад
Published on October 15, 2019 12:40 AM UTC

I. The question of confidence

Should one hold strong opinions? Some say yes. Some say that while it’s hard to tell, it tentatively seems pretty bad (probably).

A quick review of purported or plausible pros:

  1. Strong opinions lend themselves to revision:
    1. Nothing will surprise you into updating your opinion if you thought that anything could happen. A perfect Bayesian might be able to deal with myriad subtle updates to vast uncertainties, but a human is more likely to notice a red cupcake if they have claimed that cupcakes are never red. (Arguably—some would say having opinions makes you less able to notice any threat to them. My guess is that this depends on topic and personality.)
    2. ‘Not having a strong opinion’ is often vaguer than having a flat probability distribution, in practice. That is, the uncertain person’s position is not, ‘there is a 51% chance that policy X is better than policy -X’, it is more like ‘I have no idea’. Which again doesn’t lend itself to attending to detailed evidence.
    3. Uncertainty breeds inaction, and it is harder to run into more evidence if you are waiting on the fence, than if you are out there making practical bets on one side or the other.
  2. (In a bitterly unfair twist of fate) being overconfident appears to help with things like running startups, or maybe all kinds of things.
    If you run a startup, common wisdom advises going around it saying things like, ‘Here is the dream! We are going to make it happen! It is going to change the world!’ instead of things like, ‘Here is a plausible dream! We are going to try to make it happen! In the unlikely case that we succeed at something recognizably similar to what we first had in mind, it isn’t inconceivable that it will change the world!’ Probably some of the value here is just a zero sum contest to misinform people into misinvesting in your dream instead of something more promising. But some is probably real value—suppose Bob works full time at your startup either way. I expect he finds it easier to dedicate himself to the work and has a better time if you are more confident. It’s nice to follow leaders who stand for something, which tends to go with having at least some strong opinions. Even alone, it seems easier to work hard on a thing if you think it is likely to succeed. If being unrealistically optimistic just generates extra effort to be put toward your project’s success, rather than stealing time from something more promising, that is a big deal.
  3. Social competition
    Even if the benefits of overconfidence in running companies and such were all zero sum, everyone else is doing it, so what are you going to do? Fail? Only employ people willing to work at less promising looking companies? Similarly, if you go around being suitably cautious in your views, while other people are unreasonably confident, then onlookers who trust both of you will be more interested in what the other people are saying.
  4. Wholeheartedness
    It is nice to be the kind of person who knows where they stand and what they are doing, instead of always living in an intractable set of place-plan combinations. It arguably lends itself to energy and vigor. If you are unsure whether you should be going North or South, having reluctantly evaluated North as a bit better in expected value, for some reason you often still won’t power North at full speed. It’s hard to passionately be really confused and uncertain. (I don’t know if this is related, but it seems interesting to me that the human mind feels as though it lives in ‘the world’—this one concrete thing—though its epistemic position is in some sense most naturally seen as a probability distribution over many possibilities.)
  5. Creativity
    Perhaps this is the same point, but I expect my imagination for new options kicks in better when I think I’m in a particular situation than when I think I might be in any of five different situations (or worse, in any situation at all, with different ‘weightings’).

A quick review of the con:

  1. Pervasive dishonesty and/or disengagement from reality
    If the evidence hasn’t led you to a strong opinion, and you want to profess one anyway, you are going to have to somehow disengage your personal or social epistemic processes from reality. What are you going to do? Lie? Believe false things? These both seem so bad to me that I can’t consider them seriously. There is also this sub-con:

    1. Appearance of pervasive dishonesty and/or disengagement from reality
      Some people can tell that you are either lying or believing false things, due to your boldly claiming things in this uncertain world. They will then suspect your epistemic and moral fiber, and distrust everything you say.
  2. (There are probably others, but this seems like plenty for now.)

II. Tentative answers

Can we have the pros without the devastatingly terrible con? Some ideas that come to mind or have been suggested to me by friends:

1. Maintain two types of ‘beliefs’. One set of play beliefs—confident, well understood, probably-wrong—for improving in the sandpits of tinkering and chatting, and one set of real beliefs—uncertain, deferential—for when it matters whether you are right. For instance, you might have some ‘beliefs’ about how cancer can be cured by vitamins that you chat about and ponder, and read journal articles to update, but when you actually get cancer, you follow the expert advice to lean heavily on chemotherapy. I think people naturally do this a bit, using words like ‘best guess’ and ‘working hypothesis’.

I don’t like this plan much, though admittedly I basically haven’t tried it. For your new fake beliefs, either you have to constantly disclaim them as fake, or you are again lying and potentially misleading people. Maybe that is manageable through always saying ‘it seems to me that..’ or ‘my naive impression is..’, but it sounds like a mess.

And if you only use these beliefs on unimportant things, then you miss out on a lot of the updating you were hoping for from letting your strong beliefs run into reality. You get some though, and maybe you just can’t do better than that, unless you want to be testing your whacky theories about cancer cures when you have cancer.

It also seems like you won’t get a lot of the social benefits of seeming confident, if you still don’t actually believe strongly in the really confident things, and have to constantly disclaim them.

But I think I actually object because beliefs are for true things, damnit. If your evidence suggests something isn’t true, then you shouldn’t be ‘believing’ it. And also, if you know your evidence suggests a thing isn’t true, how are you even going to go about ‘believing it’? I don’t know how to.

2. Maintain separate ‘beliefs’ and ‘impressions’. This is like 1, except impressions are just claims about how things seem to you. e.g. ‘It seems to me that vitamin C cures cancer, but I believe that that isn’t true somehow, since a lot of more informed people disagree with my impression.’ This seems like a great distinction in general, but it seems a bit different from what one wants here. I think of this as a distinction between the evidence that you received, and the total evidence available to humanity, or perhaps between what is arrived at by your own reasoning about everyone’s evidence vs. your own reasoning about what to make of everyone else’s reasoning about everyone’s evidence. However these are about ways of getting a belief, and I think what you want here is actually just some beliefs that can be got in any way. Also, why would you act confidently on your impressions, if you thought they didn’t account for others’ evidence, say? Why would you act on them at all?

3. Confidently assert precise but highly uncertain probability distributions “We should work so hard on this, because it has like a 0.03% chance of reshaping 0.5% of the world, making it a 99.97th percentile intervention in the distribution we are drawing from, so we shouldn’t expect to see something this good again for fifty-seven months.” This may solve a lot of problems, and I like it, but it is tricky.

4. Just do the research so you can have strong views. To do this across the board seems prohibitively expensive, given how much research it seems to take to be almost as uncertain as you were on many topics of interest.

5. Focus on acting well rather than your effects on the world. Instead of trying to act decisively on a 1% chance of this intervention actually bringing about the desired result, try to act decisively on a 95% chance that this is the correct intervention (given your reasoning suggesting that it has a 1% chance of working out). I’m told this is related to Stoicism.

6. ‘Opinions’
I notice that people often have ‘opinions’, which they are not very careful to make true, and do not seem to straightforwardly expect to be true. This seems to be commonly understood by rationally inclined people as some sort of failure, but I could imagine it being another solution, perhaps along the lines of 1.

(I think there are others around, but I forget them.)

III. Stances

I propose an alternative solution. Suppose you might want to say something like, ‘groups of more than five people at parties are bad’, but you can’t because you don’t really know, and you have only seen a small number of parties in a very limited social milieu, and a lot of things are going on, and you are a congenitally uncertain person. Then instead say, ‘I deem groups of more than five people at parties bad’. What exactly do I mean by this? Instead of making a claim about the value of large groups at parties, make a policy choice about what to treat as the value of large groups at parties. You are adding a new variable ‘deemed large group goodness’ between your highly uncertain beliefs and your actions. I’ll call this a ‘stance’. (I expect it isn’t quite clear what I mean by a ‘stance’ yet, but I’ll elaborate soon.) My proposal: to be ‘confident’ in the way that one might be from having strong beliefs, focus on having strong stances rather than strong beliefs.

Strong stances have many of the benefits of confident beliefs. With your new stance on large groups, when you are choosing whether to arrange chairs and snacks to discourage large groups, you skip over your uncertain beliefs and go straight to your stance. And since you decided it, it is certain, and you can rearrange chairs with the vigor and single-mindedness of a person who knowns where they stand. You can confidently declare your opposition to large groups, and unite followers in a broader crusade against giant circles. And if at the ensuing party people form a large group anyway and seem to be really enjoying it, you will hopefully notice this the way you wouldn’t if you were merely uncertain-leaning-against regarding the value of large groups.

That might have been confusing, since I don’t know of good words to describe the type of mental attitude I’m proposing. Here are some things I don’t mean by ‘I deem large group conversations to be bad’:

  1. “Large group conversations are bad” (i.e. this is not about what is true, though it is related to that.)
  2. “I declare the truth to be ‘large group conversations are bad’” (i.e. This is not of a kind with beliefs. Is not directly about what is true about the world, or empirically observed, though it is influenced by these things. I do not have power over the truth.)
  3. “I don’t like large group conversations”, or “I notice that I act in opposition to large group conversations” (i.e. is not a claim about my own feelings or inclinations, which would still be a passive observation about the world)
  4. “The decision-theoretically optimal value to assign to large groups forming at parties is negative”, or “I estimate that the decision-theoretically optimal policy on large groups is opposition” (i.e. it is a choice, not an attempt to estimate a hidden feature of the world.)
  5. “I commit to stopping large group conversations” (i.e. It is not a commitment, or directly claiming anything about my future actions.)
  6. “I observe that I consistently seek to avert large group conversations” (this would be an observation about a consistency in my behavior, whereas here the point is to make a new thing (assign a value to a new variable?) that my future behavior may consistently make use of, if I want.)
  7. “I intend to stop some large group conversations” (perhaps this one is closest so far, but a stance isn’t saying anything about the future or about actions—if it doesn’t get changed by the future, and then in future I want to take an action, I’ll probably call on it, but it isn’t ‘about’ that.)

Perhaps what I mean is most like: ‘I have a policy of evaluating large group discussions at parties as bad’, though using ‘policy’ as a choice about an abstract variable that might apply to action, but not in the sense of a commitment.

What is going on here more generally? You are adding a new kind of abstract variable between beliefs and actions. A stance can be a bit like a policy choice on what you will treat as true, or on how you will evaluate something. Or it can also be its own abstract thing that doesn’t directly mean anything understandable in terms of the beliefs or actions nearby.

Some ideas we already use that are pretty close to stances are ‘X is my priority’, ‘I am in the dating market’, and arguably, ‘I am opposed to daschunds’. X being your priority is heavily influenced by your understanding of the consequences of X and its alternatives, but it is your choice, and it is not dishonest to prioritize a thing that is not important. To prioritize X isn’t a claim about the facts relevant to whether one would want to prioritize it. Prioritizing X also isn’t a commitment regarding your actions, though the purpose of having a ‘priority’ is for it to affect your actions. Your ‘priority’ is a kind of abstract variable added to your mental landscape to collect up a bunch of reasoning about the merits of different things, and package them for easy use in decisions.

Another way of looking at this is as a way of formalizing and concretifying the step where you look at your uncertain beliefs and then decide on a tentative answer and then run with it.

One can be confident in stances, because a stance is a choice, not a guess at a fact about the world. (Though my stance may contain uncertainty if I want, e.g. I could take a stance that large groups have a 75% chance of being bad on average.) So while my beliefs on a topic may be quite uncertain, my stance can be strong, in a sense that does some of the work we wanted from strong beliefs. Nonetheless, since stances are connected with facts and values, my stance can be wrong in the sense of not being the stance I should want to have, on further consideration.

In sum, stances:

  1. Are inputs to decisions in the place of some beliefs and values
  2. Integrate those beliefs and values—to the extent that you want them to be—into a single reusable statement
  3. Can be thought of as something like ‘policies’ on what will be treated as the truth (e.g. ‘I deem large groups bad’) or as new abstract variables between the truth and action (e.g. ‘I am prioritizing sleep’)
  4. Are chosen by you, not implied by your epistemic situation (until some spoilsport comes up with a theory of optimal behavior)
  5. therefore don’t permit uncertainty in one sense, and don’t require it in another (you know what your stance is, and your stance can be ‘X is bad’ rather than ‘X is 72% likely to be bad’), though you should be uncertain about how much you will like your stance on further reflection.

I have found having stances somewhat useful, or at least entertaining, in the short time I have been trying having them, but it is more of a speculative suggestion with no other evidence behind it than trustworthy advice.


Impact measurement and value-neutrality verification

9 часов 39 минут назад
Published on October 15, 2019 12:06 AM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

Recently, I've been reading and enjoying Alex Turner's Reframing Impact sequence, but I realized that I have some rather idiosyncratic views regarding impact measures that I haven't really written up much yet. This post is my attempt at trying to communicate those views, as well as a response to some of the ideas in Alex's sequence.

What can you do with an impact measure?

In the "Technical Appendix" to his first Reframing Impact post, Alex argues that an impact measure might be "the first proposed safeguard which maybe actually stops a powerful agent with an imperfect objective from ruining things—without assuming anything about the objective."

Personally, I am quite skeptical of this use case for impact measures. As it is phrased—and especially including the link to Robust Delegation—Alex seems to be implying that an impact measure could be used to solve inner alignment issues arising from a model with a mesa-objective that is misaligned relative to the loss function used to train it. However, the standard way in which one uses an impact measure is by including it in said loss function, which doesn't do very much if the problem you're trying to solve is your model not being aligned with that loss.[1]

That being said, using an impact measure as part of your loss could be helpful for outer alignment. In my opinion, however, it seems like that requires your impact measure to capture basically everything you might care about (if you want it to actually solve outer alignment), in which case I don't really see what the impact measure is buying you anymore. I think this is especially true for me because I generally see amplification as being the right solution to outer alignment, which I don't think really benefits at all from adding an impact measure.[2]

Alternatively, if you had a way of mechanistically verifying that a model behaves according to some impact measure, then I would say that you could use something like that to help with inner alignment. However, this is quite different from the standard procedure of including an impact measure as part of your loss. Instead of training your agent to behave according to your impact measure, you would instead have to train it to convince some overseer that it is internally implementing some algorithm which satisfies some minimal impact criterion. It's possible that this is what Alex actually has in mind in terms of how he wants to use impact measures, though it's worth noting that this use case is quite different than the standard one.

That being said, I'm skeptical of this use case as well. In my opinion, developing a mechanistic understanding of corrigibility seems more promising than developing a mechanistic understanding of impact. Alex mentions corrigibility as a possible alternative to impact measures in his appendix, though he notes that he's currently unsure what exactly the core principle behind corrigibility actually is. I think my post on mechanistic corrigibility gets at this somewhat, though there's definitely more work to be done there.

So, I've explained why I don't think impact measures are very promising for solving outer alignment or inner alignment—does that mean I think they're useless? No. In fact, I think a better understanding of impact could be extremely helpful, just not for any of the reasons I've talked about above.

Value-neutrality verification

In Relaxed adversarial training for inner alignment, I argued that one way of mechanistically verifying an acceptability condition might be to split a model into a value-neutral piece (its optimization procedure) and a value-laden piece (its objective). If you can manage to get such a separation, then verifying acceptability just reduces to verifying that the value-laden piece has the right properties[3] and that the the value-neutral piece is actually value-neutral.

Why is this sort of a separation useful? Well, not only might it make mechanistically verifying acceptability much easier, it might also make strategy-stealing possible in a way which it otherwise might not be. In particular, one of the big problems with making strategy-stealing work under an informed-oversight-style scheme is that some strategies which are necessary to stay competitive might nevertheless be quite difficult to justify to an informed overseer. However, if we have a good understanding of the degree to which different algorithms are value-laden vs. value-neutral, then we can use that to short-circuit the normal evaluation process, enabling your agent to steal any strategies which it can definitely demonstrate are value-neutral.

This is all well and good, but what does it even mean for an algorithm to be value-neutral and how would a model ever actually be able to demonstrate that? Well, here's what I want out of a value-neutrality guarantee: I want to consider some optimization procedure f to be value-neutral if, relative to some set of objectives Y, it doesn't tend to advantage any subset of those objectives over any other. In particular, I want it to be the case that if I start with some distribution of resources/utility/etc. over the different objectives y∈Y then I don't want that distribution to change if I give each y∈Y access to the optimization process f (this is what we need for strategy-stealing to work).

Interestingly, however, what I've just described is extremely similar to Attainable Utility Preservation (AUP), the impact measure put forward by Turner et al. Specifically, AUP measures the extent to which an algorithm relative to some set of objectives advantages those objectives relative to doing nothing. This is slightly different from what I want, but it's quite similar in a way which I think is no accident. In particular, I think it's not hard to extend the math of AUP to apply to value-neutrality verification. That is, let f:Y→(X→A) be some optimization procedure over objectives Y, states X, and actions A. Then, we can compute f's value-neutrality by calculating

neutrality(f,Y)=stdev({|Vy(f(y))−Vy(no-op)| ∣∣ y∈Y})

where Vy(π) measures the expected future discounted utility for some policy π:X→A,[4] no-op is some null policy, and stdev is the operator that finds the standard deviation of the given set. What's being measured here is precisely the extent to which f, if given to each y∈Y, would enable some y to get more value relative to others. Now, compare this to the AUP penalty term, which, for a state x∈X and action a∈A is calculated as


where Qy(x,a) measures the expected future discounted utility under the optimal policy after having taken action a in state x and C is some scaling constant.

Comparing these two equations, we can see that there's many similarities between impact and neutrality, but also a couple of major differences. First, neutrality as presented here is a function of an agent's entire policy, whereas impact is only a function of an agent's actions.[5] Conceptually, I don't think this is a real distinction—I think this just comes from the fact that I want neutrality to be an algorithmic/mechanistic property, whereas AUP was developed as something you could use as part of an RL loss. Second—and I think this is the real distinction—neutrality takes a standard deviation, whereas impact takes a mean. This lets us think of both neutrality and impact as effectively being moments of the same distribution—it's just that impact is the first moment and neutrality is the second. Outside of those differences, however, the two equations are quite similar—in fact, I wrote neutrality just by straightforwardly adopting the AUP penalty to the value-neutrality verification case.

This is why I'm optimistic about impact measurement work: not because I expect it to greatly help with alignment via the straightforward methods in the first section, but because I think it's extremely applicable to value-neutrality verification, which I think could be quite important to making relaxed adversarial training work. Furthermore, though like I said I think a lot of the current impact measure work is quite applicable to value-neutrality verification, I would be even more excited to see more work on impact measurement specifically from this perspective. I think there's a lot more work to be done here than just my writing down of neutrality (e.g. exploring what this sort of a metric actually looks like, translating other impact measures, actually running RL experiments, etc.).

Furthermore, not only do I think that value-neutrality verification is the most compelling use case for impact measures, I also think that specifically objective impact can be understood as being about value-neutrality. In "The Gears of Impact" Alex argues that "objective impact, instrumental convergence, opportunity cost, the colloquial meaning of 'power'—these all prove to be facets of one phenomenon, one structure." In my opinion, I think value-neutrality should be added to that list. We can think of actions as having objective impact to the extent that they change the distribution over which values have control over which resources—that is, the extent to which they are not value-neutral. Or, phrased another way, actions have objective impact to the extent that they break the strategy-stealing assumption. Thus, even if you disagree with me that value-neutrality verification is the most compelling use case for impact measures, I still think you should believe that if you want to understand objective impact, it's worth trying to understand strategy-stealing and value neutrality, because I think they're all secretly talking about the same thing.

  1. This isn't entirely true, since changing the loss might shift the loss landscape sufficiently such that the easiest-to-find model is now aligned, though I am generally skeptical of that approach, as it seems quite hard to ever know whether it's actually going to work or not. ↩︎

  2. Or, if it does, then if you're doing things right the amplification tree should just compute the impact itself. ↩︎

  3. On the value-laden piece, you might verify some mechanistic corrigibility property, for example. ↩︎

  4. Also suppose that Vy is normalized to have comparable units across objectives. ↩︎

  5. This might seem bad—and it is if you want to try to use this as part of an RL loss—but if what you want to do instead is verify internal properties of a model, then it's exactly what you want. ↩︎


Schematic Thinking: heuristic generalization using Korzybski's method

14 октября, 2019 - 22:29
Published on October 14, 2019 7:29 PM UTC

Epistemic status: exploration of some of the intuitions involved in discussions behind this post at MSFP.

Alfred Korzybski directs us to develop the faculty to be conscious of the act of abstracting. This means that that one has meta cognitive awareness when one does things like engage in the substitution effect, analogical reasoning, shifting the coarse-grainedness of an argument, use of the 'to be' verb form, shifting from one Marr Level to another in mid sentence etc. One of the most important skills that winds up developed as a result of such training is much more immediate awareness of what Korzybski calls the multiordinality of words, which one will be familiar with if you have read A Human's Guide to Words or are otherwise familiar with the Wittgensteinian shift in analytic philosophy (related: the Indeterminacy of Translation). In short, many words are underdetermined in their referents along more than one dimension, leading to communication problems both between people and internally (for an intuitive example, one can imagine people talking past each other in a discussion of causation when they are discussing different senses of Cause without realizing it).

I want to outline what one might call second order multiordinal words or maybe schematic thinking. With multiordinal words, one is aware of all the values that a word could be referring to. With schematic thinking one is also aware of all the words that could have occupied the space that word occupies. Kind of like seeing everything as an already filled out madlibs and reconstructing the unfilled out version.

This may sound needlessly abstract but you're already familiar with a famous example. One of Charlie Munger's most famous heuristics is inversion. With inversion we can check various ways we might be confused by reversing the meaning of one part of a chain of reasoning and seeing how that affects things. Instead of forward chaining we backwards chain, we prepend 'not' or 'doesn't' to various parts of the plan to construct premortems, we invert whatever just-so story a babbling philosopher said and see if it still makes sense to see if their explanation proves too much.

I claim that this is a specific, actionable instance of schematic thinking. The generalization of this is that one doesn't just restrict oneself to opposites, and doesn't restrict oneself to a single word at a time, though that remains an easy, simple way to break out of mental habit and see more than one possibility for any particular meaning structure.

Let's take first order indeterminacy and apply this and see what happens. To start with you can do a simple inversion of them and see what happens.

First example of first order indeterminacy: universal quantifiers

"all, always, every, never, everyone, no one, no body, none" etc

We already recognize that perverse generalizations of this form cause us problems that can often be repaired by getting specific. The additional question schematic thinking has us ask is: among the choices I can make, what influences me to make this one? Are those good reasons? What if you inverted that choice (all->none, etc), or made a different one?

Second example of first order indeterminacy: modal operators

confusion of possibility and necessity, "should, should not, must, must not, have to, need to, it is necessary" etc

The additional question we ask here as we convert 'shoulds' to 'coulds' and 'musts' to 'mays' is what sorts of mental moves are we making as we do this?

Third example of first order indeterminacy: unspecified verbs

"they are too trusting, that was rude, we will benefit from that, I tried really hard"

The additional question we ask as we get more specific about what happened is 'why are we choosing this level of coarse grainedness?' After all, depending on the context someone could accuse us of being too specific, or not being specific enough. We have intuitions about when those accusations are reasonable. How does that work?


This might seem a bit awkward and unnecessary. The concrete benefit it has brought me is that it gives me a starting point when I am reading or listening to a line of reasoning that strikes me as off in some way, but I can't quite put my finger on how. By seeing many of the distinctions being made to construct the argument as arbitrary and part of a space of possible distinctions I can start rephrasing the argument in a way that makes more sense to me. I then have a much better chance of making substantive critiques (or alternatively, becoming convinced) rather than just arguing over misunderstandings the whole time. I've found many philosophical arguments hinge on pulling a switcheroo at some key juncture. I think many people intuitively pick up on this and that this is why people dismiss many philosophical arguments, and I think they are usually correct to do so.


[AN #68]: The attainable utility theory of impact

14 октября, 2019 - 20:00
Published on October 14, 2019 5:00 PM UTC

[AN #68]: The attainable utility theory of impact View this email in your browser

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Stuart Russell at CHAI has published a book about AI safety. Expect a bonus newsletter this week summarizing the book and some of the research papers that underlie it!

Audio version here (may not be up yet).


Reframing Impact - Part 1 (Alex Turner) (summarized by Rohin): This sequence has exercises that will be spoiled by this summary, so take a moment to consider whether you want to read the sequence directly.

This first part of the sequence focuses on identifying what we mean by impact, presumably to help design an impact measure in the future. The punch line: an event is impactful to an agent if it changes the agent's ability to get what it wants. This is Attainable Utility (AU) theory. To quote the sequence: "How could something possibly be a big deal to us if it doesn't change our ability to get what we want? How could something not matter to us if it does change our ability to get what we want?"

Some implications and other ideas:

- Impact is relative to an agent: a new church is more impactful if you are a Christian than if not.

- Some impact is objective: getting money is impactful to almost any agent that knows what money is.

- Impact is relative to expectations: A burglar robbing your home is impactful to you (you weren't expecting it) but not very impactful to the burglar (who had planned it out). However, if the burglar was unsure if the burglary would be successful, than success/failure would be impactful to them.

While this may seem obvious, past work (AN #10) has talked about impact as being caused by changes in state. While of course any impact does involve a change in state, this is the wrong level of abstraction to reason about impact: fundamentally, impact is related to what we care about.

Rohin's opinion: To quote myself from a discussion with Alex, "you're looking at the optimal Q-function for the optimal utility function and saying 'this is a good measure of what we care about' and of course I agree with that". (Although this is a bit inaccurate -- it's not the optimal Q-function, but the Q-function relative to what we expect and know.)

This may be somewhat of a surprise, given that I've been pessimistic about impact measures in the past. However, my position is that it's difficult to simultaneously get three desiderata: value-agnosticism, avoidance of catastrophes, and usefulness. This characterization of impact is very explicitly dependent on values, and so doesn't run afoul of that. (Also, it just makes intuitive sense.)

This part of the sequence did change some of my thinking on impact measures as well. In particular, the sequence makes a distinction between objective impact, which applies to all (or most) agents, and value impact. This is similar to the idea of convergent instrumental subgoals, and the idea that large-scale multiagent training (AN#65) can lead to generally useful behaviors that can be applied to novel tasks. It seems plausible to me that we could make value-agnostic impact measures that primarily penalize this objective impact, and this might be enough to avoid catastrophes. This would prevent us from using AI for big, impactful tasks, but could allow for AI systems that pursue small, limited tasks. I suspect we'll see thoughts along these lines in the next parts of this sequence.

Technical AI alignment   Technical agendas and prioritization

AI Safety "Success Stories" (Wei Dai) (summarized by Matthew): It is difficult to measure the usefulness of various alignment approaches without clearly understanding what type of future they end up being useful for. This post collects "Success Stories" for AI -- disjunctive scenarios in which alignment approaches are leveraged to ensure a positive future. Whether these scenarios come to pass will depend critically on background assumptions, such as whether we can achieve global coordination, or solve the most ambitious safety issues. Mapping these success stories can help us prioritize research.

Matthew's opinion: This post does not exhaust the possible success stories, but it gets us a lot closer to being able to look at a particular approach and ask, "Where exactly does this help us?" My guess is that most research ends up being only minimally helpful for the long run, and so I consider inquiry like this to be very useful for cause prioritization.

Preventing bad behavior

Formal Language Constraints for Markov Decision Processes (Eleanor Quint et al) (summarized by Rohin): Within the framework of RL, the authors propose using constraints defined by DFAs (deterministic finite automata) in order to eliminate safety failures, or to prevent agents from exploring clearly ineffective policies (which would accelerate learning). Constraints can be defined on any auxiliary information that can be computed from the "base" MDP. A constraint could either restrict the action space, forcing the agent to take an action that doesn't violate the constraint, which they term "hard" constraints; or a constraint could impose a penalty on the agent, thus acting as a form of reward shaping, which they term a "soft" constraint. They consider two constraints: one that prevents the agent from "dithering" (going left, then right, then left, then right), and one that prevents the agent from "overactuating" (going in the same direction four times in a row). They evaluate their approach with these constraints on Atari games and Mujoco environments, and show that they lead to increased reward and decreased constraint violations.

Rohin's opinion: This method seems like a good way to build in domain knowledge about what kinds of action sequences are unlikely to work in a domain, which can help accelerate learning. Both of the constraints in the experiments do this. The paper also suggests using the technique to enforce safety constraints, but the experiments don't involve any safety constraints, and conceptually there do seem to be two big obstacles. First, the constraints will depend on state, but it is very hard to write such constraints given access only to actions and high-dimensional pixel observations. Second, you can only prevent constraint violations by removing actions one timestep before the constraint is violated: if there is an action that will inevitably lead to a constraint violation in 10 timesteps, there's no way in this framework to not take that action. (Of course, you can use a soft constraint, but this is then the standard technique of reward shaping.)

In general, methods like this face a major challenge: how do you specify the safety constraint that you would like to avoid violating? I'd love to see more research on how to create specifications for formal analysis.


Counterfactual States for Atari Agents via Generative Deep Learning (Matthew L. Olson et al)

Adversarial examples

Robustness beyond Security: Representation Learning (Logan Engstrom et al) (summarized by Cody): Earlier this year, a provocative paper (AN #62) out of MIT claimed that adversarial perturbations weren’t just spurious correlations, but were, at least in some cases, features that generalize to the test set. A subtler implied point of the paper was that robustness to adversarial examples wasn’t a matter of resolving the model’s misapprehensions, but rather one of removing the model’s sensitivity to features that would be too small for a human to perceive. If we do this via adversarial training, we get so-called “robust representations”. The same group has now put out another paper, asking the question: are robust representations also human-like representations?

To evaluate how human-like the representations are, they propose the following experiment: take a source image, and optimize it until its representations (penultimate layer activations) match those of some target image. If the representations are human-like, the result of this optimization should look (to humans) very similar to the target image. (They call this property “invertibility”.) Normal image classifiers fail miserably at this test: the image looks basically like the source image, making it a classic adversarial example. Robust models on the other hand pass the test, suggesting that robust representations usually are human-like. They provide further evidence by showing that you can run feature visualization without regularization and get meaningful results (existing methods result in noise if you don’t regularize).

Cody's opinion: I found this paper clear, well-written, and straightforward in its empirical examination of how the representations learned by standard and robust models differ. I also have a particular interest in this line of research, since I have thought for a while that we should be more clear about the fact that adversarially-susceptible models aren’t wrong in some absolute sense, but relative to human perception in particular.

Rohin’s opinion: I agree with Cody above, and have a few more thoughts.

Most of the evidence in this paper suggests that the learned representations are “human-like” in the sense that two images that have similar representations must also be perceptually similar (to humans). That is, by enforcing that “small change in pixels” implies “small change in representations”, you seem to get for free the converse: “small change in representations” implies “small change in pixels”. This wasn’t obvious to me: a priori, each feature could have corresponded to 2+ “clusters” of inputs.

The authors also seem to be making a claim that the representations are semantically similar to the ones humans use. I don’t find the evidence for this as compelling. For example, they claim that when putting the “stripes” feature on a picture of an animal, only the animal gets the stripes and not the background. However, when I tried it myself in the interactive visualization, it looked like a lot of the background was also getting stripes.

One typical regularization for feature visualization is to jitter the image while optimizing it, which seems similar to selecting for robustness to imperceptible changes, so it makes sense that using robust features helps with feature visualization. That said, there are several other techniques for regularization, and the authors didn’t need any of them, which is very interesting. On the other hand, their visualizations don't look as good to me as those from other papers.

Read more: Paper: Adversarial Robustness as a Prior for Learned Representations

Robustness beyond Security: Computer Vision Applications (Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Andrew Ilyas, Logan Engstrom et al) (summarized by Rohin): Since a robust model seems to have significantly more "human-like" features (see post above), it should be able to help with many of the tasks in computer vision. The authors demonstrate results on image generation, image-to-image translation, inpainting, superresolution and interactive image manipulation: all of which are done simply by optimizing the image to maximize the probability of a particular class label or the value of a particular learned feature.

Rohin's opinion: This provides more evidence of the utility of robust features, though all of the comments from the previous paper apply here as well. In particular, looking at the results, my non-expert guess is that they are probably not state-of-the-art (but it's still interesting that one simple method is able to do well on all of these tasks).

Read more: Paper: Image Synthesis with a Single (Robust) Classifier

Critiques (Alignment)

Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More (summarized by Rohin): See Import AI.

Miscellaneous (Alignment)

What You See Isn't Always What You Want (Alex Turner) (summarized by Rohin): This post makes the point that for Markovian reward functions on observations, since any given observation can correspond to multiple underlying states, we cannot know just by analyzing the reward function whether it actually leads to good behavior: it also depends on the environment. For example, suppose we want an agent to collect all of the blue blocks in a room together. We might simply reward it for having blue in its observations: this might work great if the agent only has the ability to pick up and move blocks, but won't work well if the agent has a paintbrush and blue paint. This makes the reward designer's job much more difficult. However, the designer could use techniques that don't require a reward on individual observations, such as rewards that can depend on the agent's internal cognition (as in iterated amplification), or rewards that can depend on histories (as in Deep RL from Human Preferences).

Rohin's opinion: I certainly agree that we want to avoid reward functions defined on observations, and this is one reason why. It seems like a more general version of the wireheading argument to me, and applies even if you think that the AI won't be able to wirehead, as long as it is capable enough to find other plans for getting high reward besides the one the designer intended.

Other progress in AI   Reinforcement learning

Behaviour Suite for Reinforcement Learning (Ian Osband et al) (summarized by Zach): Collecting clear, informative and scalable problems that capture important aspects about how to design general and efficient learning algorithms is difficult. Many current environments used to evaluate RL algorithms introduce confounding variables that make new algorithms difficult to evaluate. In this project, the authors assist this effort by introducing Behaviour Suite for Reinforcement Learning (bsuite), a library that facilitates reproducible and accessible research on core issues in RL. The idea of these experiments is to capture core issues, such as 'exploration' or 'memory', in a way that can be easily tested or evaluated. The main contribution of this project is an open-source project called bsuite, which instantiates all experiments in code and automates the evaluation and analysis of any RL agent on bsuite. The suite is designed to be flexible and includes code to run experiments in parallel on Google cloud, with Jupyter notebook, and integrations with OpenAI Gym.

Zach's opinion: It's safe to say that work towards good evaluation metrics for RL agents is a good thing. I think this paper captures a lot of the notions of what makes an agent 'good' in a way that seems readily generalizable. The evaluation time on the suite is reasonable, no more than 30 minutes per experiment. Additionally, the ability to produce automated summary reports in standard formats is a nice feature. One thing that seems to be missing from the core set of experiments is a good notion of transfer learning capability beyond simple generalization. However, the authors readily note that the suite is a work in progress so I wouldn't doubt something covering that would be introduced in time.

Rohin's opinion: The most interesting thing about work like this is what "core issues" they choose to evaluate -- it's not clear to me whether e.g. "memory" in a simple environment is something that future research should optimize for.

Read more: See Import AI


Copyright © 2019 Rohin Shah, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.


Whistle-based Synthesis

14 октября, 2019 - 15:10
Published on October 14, 2019 12:10 PM UTC

I'm reasonably happy with my Bass Whistle, where I can whistle and have it come out as a decent sounding bass. I've been using it when playing mandolin in a duo, and it fits well there. When playing piano or with a piano player, however, there's already bass so something that falls in a different place in the overall sound would be better. That could be melody, though I can't whistle fast enough for everything, but probably something simpler: harmonies, riffs, horn lines.

When I take my current software, optimized for bass, and tell it to synthesize notes a few octaves up, it sounds terrible:

  • Raw whistled input:
  • Bass version (needs headphones or good speakers):
  • Treble version:

I'm using simple additive synthesis with the first four harmonics, which means adding together four sine waves. I think what's going on is that higher notes need more complexity to sound good? Playing around with distortion and fading the harmonics at different rates it sounds a bit more interesting:

  • Adding distortion:
  • Adding fade:
  • Adding both:

I'm still not very happy with it, though. It sounds artificial and silly. There are good synthesizers, the product of decades of work on turning "play this note at this time" into good sounding audio, so perhaps I could use my pitch detection to drive a standard synthesizer?

I made some stand-alone open source software that pipes the pitch detection through to MIDI. This was kind of tricky: MIDI doesn't have a way to say "play this frequency". Instead you just have "play this note" and "bend the note by this much". How to interpret pitch bend is up to the synthesizer, but generally the range is ±2 half steps. So we need some math:

in: wavelength in: sample_rate in: current_note # Convert from "this wave is 23.2 samples long" # to "the frequency is 1896.6 HZ". frequency = sample_rate / wavelength # MIDI is equal tempered, with each octave divided # into twelve logarithmically equal pieces. Take # A440 as a reference point, so represent our # 1896.6 HZ as "25.29 half steps above A440": distance_from_a440 = 12 * log2(frequency / 440) # A440 is A4, or midi note 69, so this is 94.29. fractional_note = 69 + distance_from_a440 # MIDI uses a note + bend encoding. Stay on the # same note if possible to avoid spurious attacks. if (current_note and current_note - 2 < fractional_note < current_note + 2) integer_note = current_note else integer_note = round(fractional_note) # To compute the pitch bend, we first find the # fractional part of the note, in this case 0.29: fractional_bend = fractional_note - integer_note # The bend will always be between -2 and +2, a # whole tone up or down. MIDI uses 14 bits to # represent the range between -2 and +2, so -2 is 0 # and +2 is 2^14. The midpoint is 2^13, 8192: integer_bend = round((1 + fractional_bend / 2) * 8192 - 1) # The bend is 14bits which gets split into two 7-bit # values. We can do this with masking and shifting. bend_least_significant = integer_bend & 0b1111111 bend_most_significant = (integer_bend & 0b11111110000000) >> 7 out: integer_note out: bend_least_significant out: bend_most_significant

Initially I screwed this up, and thought pitch bend was conventionally ±1 semitone, and didn't end up catching the bug until I wrote up this post.

I have this working reasonably well, except that when I bend more than a whole note I get spurious attacks. Say I slide from A to C: the slide from A to Bb to B can all be done with pitch bend, but then once I go above B the system needs to turn off bent A and start working with a new note. I would love to suppress the attack for that note, but I don't know any way to communicate that in MIDI. I don't know what people with existing continuous pitch electronic instruments do?

A second problem I've run into is that what sounds like a steady whistled pitch actually has a lot of tiny variations. Consider this input: (mp3)

This sounds reasonably steady to me, but it isn't really. Here are some successive zero crossings:

wavelength (samples) frequency (hz) midi note 39.02 1130.07 49.3300 39.26 1123.41 49.2277 38.66 1140.68 49.4918 39.25 1123.62 49.2309 38.90 1133.71 49.3857 38.85 1135.21 49.4087

My synth doesn't mind, and just passes the variability through to the listener, where it's not a problem. I track where in the wave we are, and slowly adjust the rate we move through the wave to match the desired frequency:

  • Bass output:
  • Pure sine treble output:
When I pass that variability into some regular synths, however, even when I don't cross note boundaries I get a wavery output. I think this may be another artifact of using a synth that isn't designed for continuous pitch input? Or possibly the problem is that real MIDI pitch wheels don't just suddenly jump from +23% to +49% over a 40ms period, and so they haven't needed to design for it?

I can fix this some on my end by averaging the most recent pitches to smooth out the variability, but then it stops feeling so responsive (and quick slides don't work, and if I start a note slightly sour it takes longer to fix it). I think the answer is probably "find a better synth" but I'm not sure how to figure out what to use.

Still, I like this a lot, and I think there's something here. If you have a mac and want to play with this, the code is on github.

Comment via: facebook


Maybe Lying Doesn't Exist

14 октября, 2019 - 10:04
Published on October 14, 2019 7:04 AM UTC

In "Against Lie Inflation", the immortal Scott Alexander argues that the word "lie" should be reserved for knowingly-made false statements, and not used in an expanded sense that includes unconscious motivated reasoning. Alexander argues that the expanded sense draws the category boundaries of "lying" too widely in a way that would make the word less useful. The hypothesis that predicts everything predicts nothing: in order for "Kevin lied" to mean something, some possible states-of-affairs need to be identified as not lying, so that the statement "Kevin lied" can correspond to redistributing conserved probability mass away from "not lying" states-of-affairs onto "lying" states-of-affairs.

All of this is entirely correct. But Jessica Taylor (whose post "The AI Timelines Scam" inspired "Against Lie Inflation") wasn't arguing that everything is lying; she was just using a more permissive conception of lying than the one Alexander prefers, such that Alexander didn't think that Taylor's definition could stably and consistently identify non-lies.

Concerning Alexander's arguments against the expanded definition, I find I have one strong objection (that appeal-to-consequences is an invalid form of reasoning for optimal-categorization questions for essentially the same reason as it is for questions of simple fact), and one more speculative objection (that our intuitive "folk theory" of lying may actually be empirically mistaken). Let me explain.

(A small clarification: for myself, I notice that I also tend to frown on the expanded sense of "lying". But the reasons for frowning matter! People who superficially agree on a conclusion but for different reasons, are not really on the same page!)

Appeals to Consequences Are Invalid

There is no method of reasoning more common, and yet none more blamable, than, in philosophical disputes, to endeavor the refutation of any hypothesis, by a pretense of its dangerous consequences[.]

David Hume

Alexander contrasts the imagined consequences of the expanded definition of "lying" becoming more widely accepted, to a world that uses the restricted definition:

[E]veryone is much angrier. In the restricted-definition world, a few people write posts suggesting that there may be biases affecting the situation. In the expanded-definition world, those same people write posts accusing the other side of being liars perpetrating a fraud. I am willing to listen to people suggesting I might be biased, but if someone calls me a liar I'm going to be pretty angry and go into defensive mode. I'll be less likely to hear them out and adjust my beliefs, and more likely to try to attack them.

But this is an appeal to consequences. Appeals to consequences are invalid because they represent a map–territory confusion, an attempt to optimize our description of reality at the expense of our ability to describe reality accurately (which we need in order to actually optimize reality).

(Again, the appeal is still invalid even if the conclusion—in this case, that unconscious rationalization shouldn't count as "lying"—might be true for other reasons.)

Some aspiring epistemic rationalists like to call this the "Litany of Tarski". If Elijah is lying (with respect to whatever the optimal category boundary for "lying" turns out to be according to our standard Bayesian philosophy of language), then I desire to believe that Elijah is lying (with respect to the optimal category boundary according to ... &c.). If Elijah is not lying (with respect to ... &c.), then I desire to believe that Elijah is not lying.

If the one comes to me and says, "Elijah is not lying; to support this claim, I offer this-and-such evidence of his sincerity," then this is right and proper, and I am eager to examine the evidence presented.

If the one comes to me and says, "You should choose to define lying such that Elijah is not lying, because if you said that he was lying, then he might feel angry and defensive," this is insane. The map is not the territory! If Elijah's behavior is, in fact, deceptive—if he says things that cause people who trust him to be worse at anticipating their experiences when he reasonably could have avoided this—I can't make his behavior not-deceptive by changing the meanings of words.

Now, I agree that it might very well empirically be the case that if I say that Elijah is lying (where Elijah can hear me), he might get angry and defensive, which could have a variety of negative social consequences. But that's not an argument for changing the definition of lying; that's an argument that I have an incentive to lie about whether I think Elijah is lying! (Though Glomarizing about whether I think he's lying might be an even better play.)

Alexander is concerned that people might strategically equivocate between different definitions of "lying" as an unjust social attack against the innocent, using the classic motte-and-bailey maneuver: first, argue that someone is "lying (expanded definition)" (the motte), then switch to treating them as if they were guilty of "lying (restricted definition)" (the bailey) and hope no one notices.

So, I agree that this is a very real problem. But it's worth noting that the problem of equivocation between different category boundaries associated with the same word applies symmetrically: if it's possible to use an expanded definition of a socially-disapproved category as the motte and a restricted definition as the bailey in an unjust attack against the innocent, then it's also possible to use an expanded definition as the bailey and a restricted definition as the motte in an unjust defense of the guilty. Alexander writes:

The whole reason that rebranding lesser sins as "lying" is tempting is because everyone knows "lying" refers to something very bad.

Right—and conversely, because everyone knows that "lying" refers to something very bad, it's tempting to rebrand lies as lesser sins. Ruby Bloom explains what this looks like in the wild:

I worked in a workplace where lying was commonplace, conscious, and system 2. Clients asking if we could do something were told "yes, we've already got that feature (we hadn't) and we already have several clients successfully using that (we hadn't)." Others were invited to be part an "existing beta program" alongside others just like them (in fact, they would have been the very first). When I objected, I was told "no one wants to be the first, so you have to say that."

[...] I think they lie to themselves that they're not lying (so that if you search their thoughts, they never think "I'm lying")[.]

If your interest in the philosophy of language is primarily to avoid being blamed for things—perhaps because you perceive that you live in a Hobbesian dystopia where the primary function of words is to elicit actions, where the denotative structure of language was eroded by political processes long ago, and all that's left is a standardized list of approved attacks—in that case, it makes perfect sense to worry about "lie inflation" but not about "lie deflation." If describing something as "lying" is primarily a weapon, then applying extra scrutiny to uses of that weapon is a wise arms-restriction treaty.

But if your interest in the philosophy of language is to improve and refine the uniquely human power of vibratory telepathy—to construct shared maps that reflect the territory—if you're interested in revealing what kinds of deception are actually happening, and why—

(in short, if you are an aspiring epistemic rationalist)

—then the asymmetrical fear of false-positive identifications of "lying" but not false-negatives—along with the focus on "bad actors", "stigmatization", "attacks", &c.—just looks weird. What does that have to do with maximizing the probability you assign to the right answer??

The Optimal Categorization Depends on the Actual Psychology of Deception

My life seems like it's nothing but
A big charade

I never meant to lie to you
I swear it
I never meant to play those games

"Deception" by Jem and the Holograms

Even if the fear of rhetorical warfare isn't a legitimate reason to avoid calling things lies (at least privately), we're still left with the main objection that "lying" is a different thing from "rationalizing" or "being biased". Everyone is biased in some way or another, but to lie is "[t]o give false information intentionally with intent to deceive." Sometimes it might make sense to use the word "lie" in a noncentral sense, as when we speak of "lying to oneself" or say "Oops, I lied" in reaction to being corrected. But it's important that these senses be explicitly acknowledged as noncentral and not conflated with the central case of knowingly speaking falsehood with intent to deceive—as Alexander says, conflating the two can only be to the benefit of actual liars.

Why would anyone disagree with this obvious ordinary view, if they weren't trying to get away with the sneaky motte-and-bailey social attack that Alexander is so worried about?

Perhaps because the ordinary view relies an implied theory of human psychology that we have reason to believe is false? What if conscious intent to deceive is typically absent in the most common cases of people saying things that (they would be capable of realizing upon being pressed) they know not to be true? Alexander writes—

So how will people decide where to draw the line [if egregious motivated reasoning can count as "lying"]? My guess is: in a place drawn by bias and motivated reasoning, same way they decide everything else. The outgroup will be lying liars, and the ingroup will be decent people with ordinary human failings.

But if the word "lying" is to actually mean something rather than just being a weapon, then the ingroup and the outgroup can't both be right. If symmetry considerations make us doubt that one group is really that much more honest than the other, that would seem to imply that either both groups are composed of decent people with ordinary human failings, or that both groups are composed of lying liars. The first description certainly sounds nicer, but as aspiring epistemic rationalists, we're not allowed to care about which descriptions sound nice; we're only allowed to care about which descriptions match reality.

And if all of the concepts available to us in our native language fail to match reality in different ways, then we have a tough problem that may require us to innovate.

The philosopher Roderick T. Long writes

Suppose I were to invent a new word, "zaxlebax," and define it as "a metallic sphere, like the Washington Monument." That's the definition—"a metallic sphere, like the Washington Monument." In short, I build my ill-chosen example into the definition. Now some linguistic subgroup might start using the term "zaxlebax" as though it just meant "metallic sphere," or as though it just meant "something of the same kind as the Washington Monument." And that's fine. But my definition incorporates both, and thus conceals the false assumption that the Washington Monument is a metallic sphere; any attempt to use the term "zaxlebax," meaning what I mean by it, involves the user in this false assumption.

If self-deception is as ubiquitous in human life as authors such as Robin Hanson argue (and if you're reading this blog, this should not be a new idea to you!), then the ordinary concept of "lying" may actually be analogous to Long's "zaxlebax": the standard intensional definition ("speaking falsehood with conscious intent to deceive"/"a metallic sphere") fails to match the most common extensional examples that we want to use the word for ("people motivatedly saying convenient things without bothering to check whether they're true"/"the Washington Monument").

Arguing for this empirical thesis about human psychology is beyond the scope of this post. But if we live in a sufficiently Hansonian world where the ordinary meaning of "lying" fails to carve reality at the joints, then authors are faced with a tough choice: either be involved in the false assumptions of the standard believed-to-be-central intensional definition, or be deprived of the use of common expressive vocabulary. As Ben Hoffman points out in the comments to "Against Lie Inflation", an earlier Scott Alexander didn't seem shy about calling people liars in his classic 2014 post "In Favor of Niceness, Community, and Civilization"

Politicians lie, but not too much. Take the top story on Politifact Fact Check today. Some Republican claimed his supposedly-maverick Democratic opponent actually voted with Obama's economic policies 97 percent of the time. Fact Check explains that the statistic used was actually for all votes, not just economic votes, and that members of Congress typically have to have >90% agreement with their president because of the way partisan politics work. So it's a lie, and is properly listed as one. [bolding mine —ZMD] But it's a lie based on slightly misinterpreting a real statistic. He didn't just totally make up a number. He didn't even just make up something else, like "My opponent personally helped design most of Obama's legislation".

Was the politician consciously lying? Or did he (or his staffer) arrive at the misinterpretation via unconscious motivated reasoning and then just not bother to scrupulously check whether the interpretation was true? And how could Alexander know?

Given my current beliefs about the psychology of deception, I find myself inclined to reach for words like "motivated", "misleading", "distorted", &c., and am more likely to frown at uses of "lie", "fraud", "scam", &c. where intent is hard to establish. But even while frowning internally, I want to avoid tone-policing people whose word-choice procedures are calibrated differently from mine when I think I understand the structure-in-the-world they're trying to point to. Insisting on replacing the six instances of the phrase "malicious lies" in "Niceness, Community, and Civilization" with "maliciously-motivated false belief" would just be worse writing.

And I definitely don't want to excuse motivated reasoning as a mere ordinary human failing for which someone can't be blamed! One of the key features that distinguishes motivated reasoning from simple mistakes is the way that the former responds to incentives (such as being blamed). If the elephant in your brain thinks it can get away with lying just by keeping conscious-you in the dark, it should think again!


Regarding Archimedes (a philosophy of math anecdote)

14 октября, 2019 - 00:25
Published on October 13, 2019 9:25 PM UTC

Regarding tales told of Archimedes, other than those about his enthusiasm when he managed to conceive of the law of flotation, his contempt for the Roman soldier who demanded he would cease his calculations on those circles he had drawn on sand and the epigram on his tomb where his discovery of the relation between the volumes of a sphere, a cone and a cylinder was accounted for, there is one other worth of mentioning:

He once met, in a beach near Syracuse, a young boy and a slightly older and much taller youth. It was a warm day and those two were wearing white himatia without chitons underneath, as was usual for poorer people. Both left the right shoulder bare, with the fabric rising diagonally to the left shoulder where a pin held it in place, yet Archimedes noticed that in the way they had positioned themselves right next to each other the upper part of the child’s clothes was reaching exactly to the height of the lower part of the youth’s next to him. This formed an elegant, straight line, starting at the lower edge of the first cloth and ending to the upper edge of the other, and from a distance one would get the impression that it was all one continuous material instead of two distinct himatia.  

Despite his surprise, he immediately became aware that while he could notice the effect due to standing opposite those two, their position naturally rendered it impossible for them to witness and enjoy this harmonious sight.  

That they remained there, immobile, because they were indeed aware of the effect and meant for anyone coming near to observe it, was readily obvious; yet Archimedes couldn’t refrain from walking closer and asking just why they went into so much trouble so as to produce something they themselves weren’t allowed to observe, given that if they’d even attempt to steal a glance at it the delicate balance instantly would be ruined and therefore nothing would remain to be seen...

The youth kept silent, but the child, being more impulsive, replied that Archimedes was wrong. It wasn’t at all true that they stood there in this manner out of an intention to present to others the form he saw. To be precise, they were fully oblivious that anyone would be seeing such a thing when coming from afar. They were standing so close to each other out of mere habit. Lastly, they weren’t moving at all because they had been waiting there for their father to return from the beach.  

It is said that then, as he was walking away, he thought that it isn’t as much that nature seamlessly presents us with mathematical symmetries, but that the overall number of elements which, unbeknownst to us, are woven together into any image is so vast that a small number of our own interests is always also contained within it.  

Text was first published at https://www.patreon.com/Kyriakos



13 октября, 2019 - 20:39
Published on October 13, 2019 5:39 PM UTC

Cross-posted to my personal blog.

For a while now, I've been using "(a)" notation to denote archived versions of linked pages. This is a small effort towards creating Long Content (a) – content that has a lifespan of decades or centuries, rather than months or years.

I think basically anyone whose writing includes links to other work should include archived links alongside the original hyperlinks, if the writing is intended to be long-lived. (And if you're not trying to write long-lived content, what are you doing, even?)

I was happy to see Zuck (a) & Guzey (a) using "(a)" notation in some of their recent work. Perhaps "(a)" will catch on!

Practically, archive.fo is my first choice for creating archives of webpages. It's free to use, and it's hard for content to be removed from the archive. (Folks can't just email in requesting that arbitrary content be removed.)

But archive.fo can be slow to save new pages, and its library is fairly small.

archive.org is my second choice. It's run by the Internet Archive (fun aside (a)), is free to use, has a massive library, and is quick to add new pages. Unfortunately, folks can remove arbitrary content by emailing in, so I expect archive.org to be less durable than archive.fo in the long run.

perma.cc also seems promising. I don't use it because it's expensive if you don't have an academic affiliation.

And maybe one day Quora will come around (a) to Long Content being good...


MA Price Accuracy Law

13 октября, 2019 - 15:00
Published on October 13, 2019 12:00 PM UTC

Massachusetts has an interesting law for grocery stores to make sure price scanners are configured correctly: if your item rings up for more than the price on the shelf you get one for free (or $10 off it's it's more than $10). Specifically:

if there is a discrepancy between the advertised price, the sticker price, the scanner price or the display price and the checkout price on any grocery item, a food store or a food department shall charge a consumer the lowest price. If the checkout price or scanner price is not the lowest price or does not reflect any qualifying discount, the seller:
  • shall not charge the consumer for 1 unit of the grocery item, if the lowest price is $10 or less;

  • shall charge the consumer the lowest price less $10 for 1 unit of the grocery item, if the lowest price is more than $10

  — MGL I.XV.94.184.C

The grocery store is required to put a sign at each register describing the law, which means that when you notice this you can point to the sign. Which is way better than trying to show the cashier the relevant text of the law on your phone would be.

I have fun trying to remember the price I see for each item as I put it into my cart so if it rings up at a different price I can point that out. The law has an exception for cases where the price is a "gross error" (off by half) but in most cases discrepancies are small: ringing up at $4.99 when it said $4.50 on the shelf. Because you get the item for free if they've overcharged you, however, what matters is just that they put a misleadingly low price on the shelf.

I've noticed stores rarely have a good system in place for fixing these problems. When I catch one they generally check and give me the item for free, but that doesn't usually translate into fixing the price on the shelf. Which means that when I come in next time, it's often still wrong.

This seems like something that a group of shoppers could use together. Whenever anyone noticed a mispricing they could post to a mailing list ("the store brand blueberries are marked $3.99 but ring up as $4.29"), and then everyone on the list could go get some free blueberries. This would probably get stores to be faster about updating their prices.

Even if the stores got very fast at fixing things, though, it could still be rough for them. Say one person goes through and notices they've been overcharged for something. They don't say anything to the store, but instead write to the list and name a time. At the designated time a group of shoppers pick up one unit each and fan out over the store's checkout lines. The items are all scanned, the shoppers all object, and the store has to give away one item per checkout line instead of just one item total. This could be a parody heist plotline in a sitcom.

(While this is hard to fix with technical means, if people started doing it, of course, they would update the law.)

Comment via: facebook


What's going on with "provability"?

13 октября, 2019 - 06:59
Published on October 13, 2019 3:59 AM UTC

Every so often I hear seemingly mathematical statements involving the concept of being provable. For example:

  • I've seen Gödel's Incompleteness Theorem stated as "if a mathematical system is powerful enough to express arithmetic, then either it contains a contradiction or there are true statements that it cannot prove."
  • On the AI alignment forum, one of the pinned sequences describes Löb's Theorem as "If Peano Arithmetic can prove that a proof of P would imply the truth of P, then it can also prove P is true".

I find each of these statements baffling for a different reason:

  • Gödel: What could it mean for a statement to be "true but not provable"? Is this just because there are some statements such that neither P nor not-P can be proven, yet one of them must be true? If so, I would (stubbornly) contest that perhaps P and not-P really are both non-true.
  • Löb: How can a system of arithmetic prove anything? Much less prove things about proofs?

And I also have one more general confusion. What systems of reasoning could these kinds of theorems be set in? For example, I've heard that there are proofs that PA is consistent. Let's say one of those proofs is set in Proof System X. Now how do we know that Proof System X is consistent? Perhaps it can be proven consistent by using Proof System Y? Do we just end up making an infinite chain of appeals up along a tower of proof systems? Or do we eventually drive ourselves into the ground by reaching system that nobody could deny is valid? If so, why don't we just stop and PA or ZFC?

Oh, speaking of ZFC. There seems to be a debate about whether we should accept the Axiom of Choice. Isn't it...obviously true? I don't really understand this topic well enough to have any particular question about the AC debate, but my confusion definitely extends to that region of concept space.

So here's my question: Where can I learn about "provability" and/or what clarifying insights could you share about it?


AI alignment landscape

13 октября, 2019 - 05:10
Published on October 13, 2019 2:10 AM UTC

Here’s a talk I gave at EA Global 2019, where I describe how intent alignment fits into the broader landscape of “making AI go well,” and how my work fits into intent alignment. This is particularly helpful if you want to understand what I’m doing, but may also be useful more broadly. I often find myself wishing people were clearer about some of these distinctions.

<a href="https://medium.com/media/7e6d526c817829eea08842218290c560/href">https://medium.com/media/7e6d526c817829eea08842218290c560/href</a>

Here is the main overview slide from the talk:

The highlighted boxes are where I spend most of my time.

Here are the full slides from the talk:



Prediction Markets Don't Reveal The Territory

13 октября, 2019 - 02:54
Published on October 12, 2019 11:54 PM UTC

[A draft section from a longer piece I am writing on prediction and forecasting. Epistemic Status: I don't know what I am missing, and I am filled with doubt and uncertainty.]

If the notion of professional forecasters disturbs you in your sleep, and you toss and turn worrying about the blight of experts brooding upon the world, perhaps the golden light of distributed information systems have peaked out from beyond these darkest visions, and you have hope for the wisdom of crowds.

Prediction markets aggregate information by incentivizing predictors to place bets on the outcomes of well-defined questions. Since information can be both niche and useful, prediction markets also incentivize the development of specialized expertise that is then incorporated into the general pool of information in the form of a bet. When this works, it works very well.

When information is not widely distributed or discoverable, prediction markets are not useful. Prediction markets for WMDs, Pope Francis’ next pronouncement, or which celebrity couple will choose to live in a van probably will not work. Or consider a public prediction market about what percent of the current freshman class at California public universities will make it to graduation. Such a market would be pretty distorted if all the registrars and admissions councillors were betting as well.

Prediction markets do have some wicked clever uses too. For example, a prediction market can also act as a proxy for some other event. That is to say, that through some ingenious design one can correlate a prediction market’s assessment of an event to another measurable outcome. Here is one instance in which researchers used a prediction market about the probability of war in Iraq, correlated it to the price of oil, and estimated the effect of war on oil prices. This provided the very useful information telling us what % of the price of oil is caused by the threat of war in Iraq. At an even broader level, this prediction market design allows us to study the effects of war on economies.

On the other hand, an additional limitation to prediction markets is that people have to be interested enough to take part in them, which is a real bummer. Intelligent quantitative people might enjoy researching to gain some betting leverage in prediction markets qua prediction markets. But even then, most people want to research questions that they themselves find interesting [citation needed]. So even the best designed prediction market can fail without enough parties incentivized to care.

The greatest limitation for prediction markets however is not any of the above technical problems. We are optimistic that these can be overcome. But there is a logical problem which can’t. Since each specialized piece of information is converted into a bet, the market will react to that new information without having to know the particulars of that information. This is the beautiful wonder of markets - everything is turned into a utility function. However, for boards, administrators, and governments which want to take action based upon the information from a prediction market two bits of important information are left totally inaccessible. First, what information was the most salient for bringing the market odds where they currently are? Secondly, what aspect of the current state of affairs is the most leverageable? That is, of all the hidden factors which caused the market consensus to reach this point, which, if any of them, do we have any power to affect? If the point is to not just know what the market says, but to know how the world works, then prediction markets in themselves may not be of much help. Here are two quick examples to demonstrate illustrate the point:

You work at the North Pole managing present-procurement for heads of state (PP-HOS, for short). Each year you scramble to get enough coal for the various heads of state because you don’t know until Christmas week whether they are on the naughty or nice list. This is an important question because heads of state receive coal proportional to their standing in society, and since the cost of coal rises in winter, it costs your department quite a bit of money to expedite all these coal orders. So this year you have created a prediction market to tell you the chances of the president of Hungary getting coal again this year and you plan on acting on the market’s prediction in September, well ahead of the November coal rush…. The market is a success! Your department saves some money, and you save just about the right amount of coal for the beloved president of Hungary. But when the big man pulls the plug on funding the market apparatus, you realize that despite all the little helpers that made the market a success, you didn’t gain any wisdom about how to predict whether a head of state will get coal this year from it. That is an example of a market working without conveying any insights. Thus markets keep inscrutable the inner workings of Father Christmas’ naughty list.

The second example demonstrates the leverage problem of a market. You are the principal of a school. You get drunk one night and rent out a Vegas casino which you revamp into a test score betting complex. You want to know how your students will do on this week’s standardized test. So you make all their information available to patrons who then place bets on each student. Despite the drugs, sex, and alcohol in this administrative Bacchanal, the market works astoundingly well, and the predicted individual performance on the standardized tests matches the actual performance near perfectly. However, in the sober light of late afternoon, you realize that your market solution for predicting scores didn’t reveal much about what you should be doing differently. In fact, the post-mortem indicates that the 3 biggest predictors of test scores are not things even remotely under your control. You despair believing that there is nothing you can do to help students improve. Even if there were a fourth cause of test success which is under your control, it doesn’t matter and will not be discernible among the thousands of bets made, because it, like everything else was flattened into the same utility function.


Planned Power Outages

12 октября, 2019 - 17:10
Published on October 12, 2019 2:10 PM UTC

With the dubiously motivated PG&E blackouts in California there are many stories about how lack of power is a serious problem, especially for people with medical dependencies on electricity. Examples they give include people who:

  • Have severe sleep apnea, and can't safely sleep without a CPAP.

  • Sleep on a mattress that needs continous electricity to prevent it from deflating.

  • Need to keep their insulin refrigerated.

  • Use a medicine delivery system that requires electricity every four hours to operate.

This outage was dangerous for them and others, but it also seems like a big problem that they're in a position where they need absolutely reliable grid power. Even without politically motivated outages, the grid isn't built to a standard of complete reliabilty.

There's an awkward valley between "reasonably reliable, but with a major outage every few years in a storm or something" and "completely reliable, and you can trust your life on it" where the system is reliable enough that we stop thinking of it as something that might go away but it's not so reliable that we should.

We can't get California out of this valley by investing to the point that there won't be outages; earthquakes, if nothing else, ensure that. So instead we should plan for outages, and make outages frequent enough that this planning will actually happen. Specifically:

  • Insurance should cover backup power supplies for medical equipment, and they should be issued by default.

  • When there hasn't been an outage in ~1y, there should be a test outage to uncover unknown dependencies.

While this outage was probably not done for good reasons, the problems it has uncovered are ones we need to fix.


I would like to try double crux.

12 октября, 2019 - 08:13
Published on October 10, 2019 9:34 PM UTC


I would like to try double crux https://www.lesswrong.com/posts/exa5kmvopeRyfJgCy/double-crux-a-strategy-for-resolving-disagreement with someone. My statement A is "There is God" (I indeed believe in it, it is not just for the sake of trying the technique). I have three cruxes (well, two and a half, to be honest), according to the rules I do not publish it here so that you would prepare your cruxes independently.

Thank you!


A Short Introduction to Theory of Change

12 октября, 2019 - 03:01
Published on October 11, 2019 7:00 PM UTC

(Cross-posted from LinkedIn.)

At the heart of any strategy are two questions: what do we want to accomplish? And how are we going to do it?

In many situations, answering these questions might not seem difficult. We may already have a mission statement or set of values that guides all of our actions, addressing the first question. Likewise, we may already have a plan of action in place, a set of activities that seems to match the goals we’ve set out. Problem, meet solution. Done and done.

As intuitive as it is to imagine the beginning and the end of that process, though, all too often the devil is in the details—or more specifically in this case, all of the pesky steps in between. Figuring out what those ought to be takes real work, and is generally not something that can be done in one’s head. And because it takes work, a lot of times we don’t bother to do it.

Fortunately, there is a tool called theory of change that provides a means of figuring out all the steps. A theory of change is a visual depiction of your strategy. You probably already have a notion in your head of what your strategy is, but a theory of change gives you a means of articulating that strategy in a form of a diagram.

Why would you want to do this? Making your theory of change explicit accomplishes several things:

  • Developing a theory of change is a great way to get big questions about program or organizational strategy out in the open. In some cases, these questions might not have ever been considered before, or they’ve been thought about but never discussed among the team.
  • A finished theory of change diagram is a useful shorthand for explaining the mechanics of how a program works to someone who’s not that familiar with it. It’s not a marketing document, but it is a communication tool for people who want to understand your strategy in depth.
  • Theories of change are great for helping you decide what to measure. What information is most important for your program or organization to have? Are there any assumptions underlying your success that seem especially vulnerable or uncertain? Does the theory of change show that certain objectives have to be met before any others are possible? If so, you might want to focus measurement or tracking efforts on those objectives or assumptions.
  • Walking through a theory of change with new staff members can be a great way to get them up to speed quickly on the “big picture,” and not just the details that they need to focus on day to day. A theory of change in this context can be particularly effective at showing staff how their work fits into a larger plan.
  • Many funders are accustomed to working with theories of change and logic models, and including a theory of change with a grant proposal can communicate that you speak their language and have thought seriously about how a program works.

Nearly all theories of change contain the following fundamental elements. In combination, they describe a linear, causal pathway between programs or policy interventions and an aspirational end-state.

  • Activities are actions or strategies undertaken by the organization that is the subject of the theory of change. These activities usually take place in the context of ongoing programs, although they can also be one-time projects, special initiatives, or policies such as legislation or regulations.
  • Outcomes are the desired short-, medium-, or long-term results of successful program or policy implementation.
  • Impacts (or Goals) represent the highest purpose of the program, initiative, or organization that is the subject of the theory of change.

To illustrate this, we can look at a simple example. Let’s say you’ve decided you want to go to law school, and in order to get into law school you have to get a good score on the LSAT. So, how can you make sure you get a good score? Intuitively, you decide that taking a test prep class is the way to go. It sounds simple enough, but it’s worth thinking through the assumption that taking the test prep class would actually improve your score. Why do we think that might happen? Well, one factor could certainly be that you get more familiarity with the test and the types of questions asked. Perhaps there is another, more psychological factor at work too. If you’re someone who gets nervous taking tests, the practice exams and deep engagement with the material that comes with a class could help you to get more comfortable with the idea of the LSAT and make it seem less intimidating, thus improving performance.

Sure enough, this line of thinking lends itself quite easily to a theory of change:

Of course, most real-life programs and initiatives are quite a bit more complex than this simple example, which is why it's important to take the time to get the details outside of your head and onto the page or screen. Here's a theory of change I helped develop for the William and Flora Hewlett Foundation's Performing Arts Program, which distributes about $20 million a year to organizations in the San Francisco Bay Area region. This was the first theory of change the program ever had, and it was used to guide grantmaking between 2009 and 2011. (Don't be confused by the labels; in this case, "Ultimate Outcomes" = "Impacts," and "Cluster" and "Component" outcomes just mean early-stage and late-stage respectively.)

The truth is that any decision you make, if it has any element of intentionality, can be diagrammed as a theory of change. Everything from taking an umbrella with you in case it rains to making time for your favorite TV show has a theory of change behind it. Even if the idea of formalizing your decision-making in this way feels utterly unnatural, I can assure you that if you think strategically at all, then you have a theory of change in your head already. What I can’t tell you is whether it’s a good theory of change—that’s something that you probably won’t be able to figure out until you take the time to write it down and get feedback on it.

Theory of change was developed originally as an evaluation methodology. But I’ve come to believe it’s much more powerful when deployed as a design tool for strategy. I’ve worked with many different strategy frameworks over the years, and most of them are essentially the same set of tools in different packaging. For me, what sets theory of change apart is its insistence that we name the assumptions of cause and effect behind our work. It can’t tell you what your goals should be, but if you already know where you want to end up, I don’t know of another tool that prompts anywhere near the same level of critical thinking about how you’re going to get there.


A simple sketch of how realism became unpopular

12 октября, 2019 - 01:25
Published on October 11, 2019 10:25 PM UTC

[Epistemic status: Sharing current impressions in a quick, simplified way in case others have details to add or have a more illuminating account. Medium-confidence that this is one of the most important parts of the story.]

Here's my current sense of how we ended up in this weird world where:

  • I still intermittently run into people who claim that there's no such thing as reality or truth;
  • a lot of 20th-century psychologists made a habit of saying things like 'minds don't exist, only behaviors';
  • a lot of 20th-century physicists made a habit of saying things like 'quarks don't exist, only minds';
  • there's a big academic split between continental thinkers saying (or being rounded off to saying) some variant of "everything is culture / perception / discourse / power" and Anglophone thinkers saying (or being rounded off to saying) "no".

Background context:

1. The ancient Greeks wrote down a whole lot of arguments. In many cases, we're missing enough textual fragments or context that we don't really know why they were arguing — what exact propositions were in dispute, or what the stakes were.

2. In any case, most of this is screened off by the fact that Europe's memetic winners were Christianity plus normal unphilosophical beliefs like "the sky is, in fact, blue".

3. Then, in 1521, the Protestant Reformation began.

4. In 1562, the Catholics found a giant list of arguments against everything by the minor Greek skeptic Sextus Empiricus, got very excited, and immediately weaponized them to show that the Protestant arguments fail (because all arguments fail).

5. These soon spread and became a sensation, and not just for being a useful superweapon. A lot of intellectuals were earnest humanists used to taking arguments at face value, and found Sextus' arguments genuinely upsetting and fascinating.

I trace continental thinkers' "everything is subjective/relative" arguments back to a single 1710 error in George Berkeley:

[...] I am content to put the whole upon this Issue; if you can but conceive it possible for one extended moveable Substance, or in general, for any one Idea or any thing like an Idea, to exist otherwise than in a Mind perceiving it, I shall readily give up the Cause[....]But say you, surely there is nothing easier than to imagine Trees, for instance, in a Park, or Books existing in a Closet, and no Body by to perceive them. I answer, you may so, there is no difficulty in it: But what is all this, I beseech you, more than framing in your Mind certain Ideas which you call Books and Trees, and the same time omitting to frame the Idea of any one that may perceive them? But do not you your self perceive or think of them all the while? This therefore is nothing to the purpose: It only shews you have the Power of imagining or forming Ideas in your Mind; but it doth not shew that you can conceive it possible, the Objects of your Thought may exist without the Mind: To make out this, it is necessary that you conceive them existing unconceived or unthought of, which is a manifest Repugnancy.

If I can imagine a tree that exists outside of any mind, then I can imagine a tree that is not being imagined. But "an imagined X that is not being imagined" is a contradiction. Therefore everything I can imagine or conceive of must be a mental object.

Berkeley ran with this argument to claim that there could be no unexperienced objects, therefore everything must exist in some mind — if nothing else, the mind of God.

The error here is mixing up what falls inside vs. outside of quotation marks. "I'm conceiving of a not-conceivable object" is a formal contradiction, but "I'm conceiving of the concept 'a not-conceivable object'" isn't, and human brains and natural language make it easy to mix up levels like those.

(I can immediately think of another major milestone in the history of European thought, Anselm's ontological argument for God, that shows the same brain bug.)

Berkeley's master argument was able to find fertile soil in an environment rife with non-naturalism, skeptical arguments, and competition between epistemic criteria and authorities. Via Kant and Kant's successors (especially Hegel), he successfully convinced the main current of 19th-century European philosophy to treat the idea of a "mind-independent world" as something ineffable or mysterious, and to treat experiences or perspectives as fundamental.

My unscholarly surface impression of the turn of the 20th century is that these memes ("the territory is fundamentally mysterious" and "maps are sort of magical and cosmically important") allowed a lot of mysticism and weird metaphysics to creep into intellectual life, but that ideas like those are actually hard to justify in dry academic prose, such that the more memetically fit descendants of idealism in the 20th century ended up being quietist ("let's just run experiments and not talk about all this weird 'world' stuff") or phenomenalist / skeptic / relativist ("you can't know 'world' stuff, so let's retreat to just discussing impressions; and maybe you can't even know those, so really what's left is power struggles").

Today, the pendulum has long since swung back again in most areas of intellectual life, perhaps because we've more solidly settled around our new central authority (science) and the threats to centralized epistemic authority (religious and philosophical controversy) are more distant memories. Metaphysics and weird arguments are fashionable again in analytic philosophy; behaviorism is long-dead in psychology; and quietism, non-realism, and non-naturalism at least no longer dominate the discussion in QM, though a lot of Copenhagen slogans remain popular.

The above is a very simple picture featuring uneven scholarship, and history tends to be messier than all that. (Ideas get independently rediscovered, movements go one step forward only to retreat two steps back, etc.) Also, I'm not claiming that everyone endorsed the master argument as stated, just that the master argument happened to shift intellectual fashions in this direction in a durable way.


Rent Needs to Decrease

11 октября, 2019 - 15:40
Published on October 11, 2019 12:40 PM UTC

Here's part of a comment I got on my housing coalitions post: I consider it extremely unlikely you have found renters with the expectation of rent going down. Assuming they want to live in a well maintained building, I consider unlikely they even desire it, once they think about it. What renters hope for in general is increases that are less than their increases in income. Landlords mostly do expect that rents will go up, but the magnitude of their expectations matters, many have the same expectations as renters for moderate increases. Others will have short term/transactional thinking and will want to charge what the market will bear.

This seems worth being explicit about: when I talk about how I think rents should be lower, I really mean lower. I'm not trying to say that it's ok if rent keeps rising as long as incomes rise faster, but that rents should go down.

Here are Boston rents in June 2011:

And in June 2019:

These are on the same scale, though not adjusted for inflation (13% from 2011-06 to 2019-06).

In 2011 a two-bedroom apartment in my part of Somerville would have gone for $1800/month, or $2050/month in 2019 dollars. In 2019, it would be $3000/month. Compared to 13% inflation we have 67% higher rents.

Another way to look at this is that for what you would pay now for an apartment a ten-minute walk from Davis would, in 2011, have covered an apartment a ten-minute walk from Park St. And what you would have been paying for a Harvard Sq apartment in 2011 wouldn't get you an East Arlington apartment today.

These large increases have been a windfall for landlords. Property taxes haven't risen much, upkeep is similar, but because demand has grown so much without supply being permitted to rise to meet it the market rent is much higher. If we build enough new housing that rents fall to 2011 levels, landlords will make less money than they had been hoping, but they'll still be able to afford to keep up their properties.

I'll support pretty much any project that builds more bedrooms: market rate, affordable, public, transitional. Rents are so high that all of these would still be worth building and maintaining even if everyone could see were were building enough housing that rents would fall next year to 2011 levels.

As a homeowner and a landlord, I know that this means I would get less in rent and I'm ok with that. I value a healthy community that people can afford to live in far more than a market that pays me a lot of money for being lucky enough to have bought a two-family at a good time.