CopyrightTable of ContentsForeword by Royal HansenForeword by Michael WildpanerPrefaceWhy We Wrote This BookWho This Book Is ForA Note About CultureHow to Read This BookConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgmentsPart I. Introductory MaterialChapter 1. The Intersection of Security and ReliabilityOn Passwords and Power DrillsReliability Versus Security: Design ConsiderationsConfidentiality, Integrity, AvailabilityConfidentialityIntegrityAvailabilityReliability and Security: CommonalitiesInvisibilityAssessmentSimplicityEvolutionResilienceFrom Design to ProductionInvestigating Systems and LoggingCrisis ResponseRecoveryConclusionChapter 2. Understanding AdversariesAttacker MotivationsAttacker ProfilesHobbyistsVulnerability ResearchersGovernments and Law EnforcementActivistsCriminal ActorsAutomation and Artificial IntelligenceInsidersAttacker MethodsThreat IntelligenceCyber Kill Chains™Tactics, Techniques, and ProceduresRisk Assessment ConsiderationsConclusionPart II. Designing SystemsChapter 3. Case Study: Safe ProxiesSafe Proxies in Production EnvironmentsGoogle Tool ProxyConclusionChapter 4. Design TradeoffsDesign Objectives and RequirementsFeature RequirementsNonfunctional RequirementsFeatures Versus Emergent PropertiesExample: Google Design DocumentBalancing RequirementsExample: Payment ProcessingManaging Tensions and Aligning GoalsExample: Microservices and the Google Web Application FrameworkAligning Emergent-Property RequirementsInitial Velocity Versus Sustained VelocityConclusionChapter 5. Design for Least PrivilegeConcepts and TerminologyLeast PrivilegeZero Trust NetworkingZero TouchClassifying Access Based on RiskBest PracticesSmall Functional APIsBreakglassAuditingTesting and Least PrivilegeDiagnosing Access DenialsGraceful Failure and Breakglass MechanismsWorked Example: Configuration DistributionPOSIX API via OpenSSHSoftware Update APICustom OpenSSH ForceCommandCustom HTTP Receiver (Sidecar)Custom HTTP Receiver (In-Process)TradeoffsA Policy Framework for Authentication and Authorization DecisionsUsing Advanced Authorization ControlsInvesting in a Widely Used Authorization FrameworkAvoiding Potential PitfallsAdvanced ControlsMulti-Party Authorization (MPA)Three-Factor Authorization (3FA)Business JustificationsTemporary AccessProxiesTradeoffs and TensionsIncreased Security ComplexityImpact on Collaboration and Company CultureQuality Data and Systems That Impact SecurityImpact on User ProductivityImpact on Developer ComplexityConclusionChapter 6. Design for UnderstandabilityWhy Is Understandability Important?System InvariantsAnalyzing InvariantsMental ModelsDesigning Understandable SystemsComplexity Versus UnderstandabilityBreaking Down ComplexityCentralized Responsibility for Security and Reliability RequirementsSystem ArchitectureUnderstandable Interface SpecificationsUnderstandable Identities, Authentication, and Access ControlSecurity BoundariesSoftware DesignUsing Application Frameworks for Service-Wide RequirementsUnderstanding Complex Data FlowsConsidering API UsabilityConclusionChapter 7. Design for a Changing LandscapeTypes of Security ChangesDesigning Your ChangeArchitecture Decisions to Make Changes EasierKeep Dependencies Up to Date and Rebuild FrequentlyRelease Frequently Using Automated TestingUse ContainersUse MicroservicesDifferent Changes: Different Speeds, Different TimelinesShort-Term Change: Zero-Day VulnerabilityMedium-Term Change: Improvement to Security PostureLong-Term Change: External DemandComplications: When Plans ChangeExample: Growing Scope—HeartbleedConclusionChapter 8. Design for ResilienceDesign Principles for ResilienceDefense in DepthThe Trojan HorseGoogle App Engine AnalysisControlling DegradationDifferentiate Costs of FailuresDeploy Response MechanismsAutomate ResponsiblyControlling the Blast RadiusRole SeparationLocation SeparationTime SeparationFailure Domains and RedundanciesFailure DomainsComponent TypesControlling RedundanciesContinuous ValidationValidation Focus AreasValidation in PracticePractical Advice: Where to BeginConclusionChapter 9. Design for RecoveryWhat Are We Recovering From?Random ErrorsAccidental ErrorsSoftware ErrorsMalicious ActionsDesign Principles for RecoveryDesign to Go as Quickly as Possible (Guarded by Policy)Limit Your Dependencies on External Notions of TimeRollbacks Represent a Tradeoff Between Security and ReliabilityUse an Explicit Revocation MechanismKnow Your Intended State, Down to the BytesDesign for Testing and Continuous ValidationEmergency AccessAccess ControlsCommunicationsResponder HabitsUnexpected BenefitsConclusionChapter 10. Mitigating Denial-of-Service AttacksStrategies for Attack and DefenseAttacker’s StrategyDefender’s StrategyDesigning for DefenseDefendable ArchitectureDefendable ServicesMitigating AttacksMonitoring and AlertingGraceful DegradationA DoS Mitigation SystemStrategic ResponseDealing with Self-Inflicted AttacksUser BehaviorClient Retry BehaviorConclusionPart III. Implementing SystemsChapter 11. Case Study: Designing, Implementing, and Maintaining a Publicly Trusted CABackground on Publicly Trusted Certificate AuthoritiesWhy Did We Need a Publicly Trusted CA?The Build or Buy DecisionDesign, Implementation, and Maintenance ConsiderationsProgramming Language ChoiceComplexity Versus UnderstandabilitySecuring Third-Party and Open Source ComponentsTestingResiliency for the CA Key MaterialData ValidationConclusionChapter 12. Writing CodeFrameworks to Enforce Security and ReliabilityBenefits of Using FrameworksExample: Framework for RPC BackendsCommon Security VulnerabilitiesSQL Injection Vulnerabilities: TrustedSqlStringPreventing XSS: SafeHtmlLessons for Evaluating and Building FrameworksSimple, Safe, Reliable Libraries for Common TasksRollout StrategySimplicity Leads to Secure and Reliable CodeAvoid Multilevel NestingEliminate YAGNI SmellsRepay Technical DebtRefactoringSecurity and Reliability by DefaultChoose the Right ToolsUse Strong TypesSanitize Your CodeConclusionChapter 13. Testing CodeUnit TestingWriting Effective Unit TestsWhen to Write Unit TestsHow Unit Testing Affects CodeIntegration TestingWriting Effective Integration TestsDynamic Program AnalysisFuzz TestingHow Fuzz Engines WorkWriting Effective Fuzz DriversAn Example FuzzerContinuous FuzzingStatic Program AnalysisAutomated Code Inspection ToolsIntegration of Static Analysis in the Developer WorkflowAbstract InterpretationFormal MethodsConclusionChapter 14. Deploying CodeConcepts and TerminologyThreat ModelBest PracticesRequire Code ReviewsRely on AutomationVerify Artifacts, Not Just PeopleTreat Configuration as CodeSecuring Against the Threat ModelAdvanced Mitigation StrategiesBinary ProvenanceProvenance-Based Deployment PoliciesVerifiable BuildsDeployment Choke PointsPost-Deployment VerificationPractical AdviceTake It One Step at a TimeProvide Actionable Error MessagesEnsure Unambiguous ProvenanceCreate Unambiguous PoliciesInclude a Deployment BreakglassSecuring Against the Threat Model, RevisitedConclusionChapter 15. Investigating SystemsFrom Debugging to InvestigationExample: Temporary FilesDebugging TechniquesWhat to Do When You’re StuckCollaborative Debugging: A Way to TeachHow Security Investigations and Debugging DifferCollect Appropriate and Useful LogsDesign Your Logging to Be ImmutableTake Privacy into ConsiderationDetermine Which Security Logs to RetainBudget for LoggingRobust, Secure Debugging AccessReliabilitySecurityConclusionPart IV. Maintaining SystemsChapter 16. Disaster PlanningDefining “Disaster”Dynamic Disaster Response StrategiesDisaster Risk AnalysisSetting Up an Incident Response TeamIdentify Team Members and RolesEstablish a Team CharterEstablish Severity and Priority ModelsDefine Operating Parameters for Engaging the IR TeamDevelop Response PlansCreate Detailed PlaybooksEnsure Access and Update Mechanisms Are in PlacePrestaging Systems and People Before an IncidentConfiguring SystemsTrainingProcesses and ProceduresTesting Systems and Response PlansAuditing Automated SystemsConducting Nonintrusive TabletopsTesting Response in Production EnvironmentsRed Team TestingEvaluating ResponsesGoogle ExamplesTest with Global ImpactDiRT Exercise Testing Emergency AccessIndustry-Wide VulnerabilitiesConclusionChapter 17. Crisis ManagementIs It a Crisis or Not?Triaging the IncidentCompromises Versus BugsTaking Command of Your IncidentThe First Step: Don’t Panic!Beginning Your ResponseEstablishing Your Incident TeamOperational SecurityTrading Good OpSec for the Greater GoodThe Investigative ProcessKeeping Control of the IncidentParallelizing the IncidentHandoversMoraleCommunicationsMisunderstandingsHedgingMeetingsKeeping the Right People Informed with the Right Levels of DetailPutting It All TogetherTriageDeclaring an IncidentCommunications and Operational SecurityBeginning the IncidentHandoverHanding Back the IncidentPreparing Communications and RemediationClosureConclusionChapter 18. Recovery and AftermathRecovery LogisticsRecovery TimelinePlanning the RecoveryScoping the RecoveryRecovery ConsiderationsRecovery ChecklistsInitiating the RecoveryIsolating Assets (Quarantine)System Rebuilds and Software UpgradesData SanitizationRecovery DataCredential and Secret RotationAfter the RecoveryPostmortemsExamplesCompromised Cloud InstancesLarge-Scale Phishing AttackTargeted Attack Requiring Complex RecoveryConclusionPart V. Organization and CultureChapter 19. Case Study: Chrome Security TeamBackground and Team EvolutionSecurity Is a Team ResponsibilityHelp Users Safely Navigate the WebSpeed MattersDesign for Defense in DepthBe Transparent and Engage the CommunityConclusionChapter 20. Understanding Roles and ResponsibilitiesWho Is Responsible for Security and Reliability?The Roles of SpecialistsUnderstanding Security ExpertiseCertifications and AcademiaIntegrating Security into the OrganizationEmbedding Security Specialists and Security TeamsExample: Embedding Security at GoogleSpecial Teams: Blue and Red TeamsExternal ResearchersConclusionChapter 21. Building a Culture of Security and ReliabilityDefining a Healthy Security and Reliability CultureCulture of Security and Reliability by DefaultCulture of ReviewCulture of AwarenessCulture of YesCulture of InevitablyCulture of SustainabilityChanging Culture Through Good PracticeAlign Project Goals and Participant IncentivesReduce Fear with Risk-Reduction MechanismsMake Safety Nets the NormIncrease Productivity and UsabilityOvercommunicate and Be TransparentBuild EmpathyConvincing LeadershipUnderstand the Decision-Making ProcessBuild a Case for ChangePick Your BattlesEscalations and Problem ResolutionConclusionConclusionAppendix A. A Disaster Risk Assessment MatrixIndexAbout the EditorsColophon