News
A brief list of ways AI safety efforts could be net negative " Less Wrong
2+ hour, 1+ min ago (245+ words) I'm not aware of a good list of downside risks for AI safety broadly[1], so I decided to make one. This is not intended to be fully comprehensive, these are just the ones that I personally take seriously[2][3]: (This list…...
Futarchy is insecure without a proposal gatekeeper " Less Wrong
5+ hour, 51+ min ago (916+ words) Asset futarchy is attractive because it lets markets compare a proposal's expected effect on token value. That comparison is only reliable when conditional prices track the proposal's causal effect rather than strategic behavior around the decision rule. The attacks below…...
Futarchy is not secure without a proposal gatekeeper " Less Wrong
5+ hour, 51+ min ago (916+ words) Asset futarchy is attractive because it lets markets compare a proposal's expected effect on token value. That comparison is only reliable when conditional prices track the proposal's causal effect rather than strategic behavior around the decision rule. The attacks below…...
Typical Minds Aren't " Less Wrong
3+ hour, 3+ min ago (308+ words) We all know the typical mind fallacy'the bias where we assume that other people's minds are much like our own. It happens because most of our evidence for what minds are like comes from experiencing what our own mind is…...
The one-week sprint " Less Wrong
5+ hour, 28+ min ago (441+ words) Recently I've been working in one-week sprints, and I've really enjoyed it! Tl; dr I need to do a lot of creative knowledge work, and have recently fallen into a routine which IMO is pretty good at facilitating that. Monday…...
Adversarial Proposal Design in Asset Futarchy " Less Wrong
5+ hour, 51+ min ago (652+ words) Asset futarchy is hardest to attack when conditional prices stay tightly coupled to a proposal's real causal effect on ASSET value. The proposal strategies below work by loosening that coupling. A proposer promises value-creating work, but treats delivery as the…...
Research agenda: Interpretive debate " Less Wrong
18+ hour, 27+ min ago (674+ words) One sentence pitch: our goal is to develop a piece of epistemic infrastructure for iteratively and empirically answering interpretive questions about AI models, where the accumulation of empirics leads to resolution of interpretive ambiguity and/or calibration of uncertainty. This…...
Does it feel any different to be reverse-chiral life? " Less Wrong
19+ hour, 17+ min ago (1637+ words) I will examine the concept of chirality (the difference between a right hand and a left hand, generalized) and its relevance to philosophy of mind. Philosophy of mind often deals with colors: colors of worldly objects and of mental representations…...
Midjourney's Spa, or when sci-fi tries to become mundane " Less Wrong
19+ hour, 46+ min ago (522+ words) Midjourney has just announced their jump from being just the "makes funny images" AI company to being the "revolutionises diagnostics and human medicine forever" AI company, as a side gig. Here's the post. Basically, they've announced the creation of a…...
The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn't " Less Wrong
20+ hour, 53+ min ago (368+ words) Suppose we have a dangerous misaligned AI that can fool alignment audits, and distill it into a student model. Two things can happen: "...