AI jailbreaks: What they are and how they can be mitigated - Cyber Security Review

AI jailbreaks: What they are and how they can be mitigated

Posted onJune 4, 2024June 18, 2024AuthorCyber Security Review

Generative AI systems are made up of multiple components that interact to provide a rich user experience between the human and the AI model(s).

As part of a responsible AI approach, AI models are protected by layers of defense mechanisms to prevent the production of harmful content or being used to carry out instructions that go against the intended purpose of the AI integrated application. This blog will provide an understanding of what AI jailbreaks are, why generative AI is susceptible to them, and how you can mitigate the risks and harms.

Read more…
Source: Microsoft

Sign up for our Newsletter

Related:

Ghostcommit attack hides malicious AI instructions in images
July 13, 2026
Ghostcommit is a proof of concept that shows how AI assistants used to review software code can be tricked by hidden instructions embedded in images. The academic ASSET Research Group showed that an attacker can place instructions inside an image file, point to it in an AGENTS.md file, and get an AI coding agent to follow those instructions during a ...
US healthcare AI platform Xsolis confirms data breach that affects 1.4 million individuals
June 23, 2026
Healthcare technology company Xsolis disclosed a cyberattack in which it lost sensitive data on almost 1.4 million customers. Xsolis is a company that uses AI to help healthcare organizations make faster and more consistent decisions about patient care and utilization management. Earlier this week, it published a data breach notification on its website, saying that it ...
AI models capable of devastating attacks on governments and business months away
June 22, 2026
Powerful AI models capable of devastating new cyber attacks on governments and businesses are mere months away, intelligence agencies for the Five Eyes have warned in a rare joint statement, urging leaders to “act now”. The surprising public intervention by signals agencies for Australia, the US, the UK, New Zealand and Canada comes after the Trump administration ...
Security experts warn of AI-boosted scam campaigns that can trick even the smartest victims
June 21, 2026
Messaging scams are becoming increasingly sophisticated as criminals use AI to imitate trusted people, familiar brands, and everyday conversations. New research from Kaspersky suggests these schemes are succeeding with alarming speed, often convincing victims to hand over money within minutes. The findings indicate that digital experience alone may no longer provide reliable protection against modern fraud attempts. Read more… Source: TechRadar ...
Threat Actors Abuse claude.ai Shared Chat for ClickFix Malvertising Campaign
June 17, 2026
TrendAI™ Research tracked a sustained malvertising campaign that abused Google Ads to deliver ClickFix social engineering attacks disguised as popular AI developer tools. The campaign impersonated at least six legitimate brand names, including ChatGPT Codex, Perplexity, Cursor IDE, JetBrains, Claude AI, and claude.ai, and simultaneously ran Mac utility scam lures. By leveraging paid search ads targeting users actively ...
Hijacking Vertex AI Model Uploads for Cross-Tenant RCE
June 16, 2026
Palo Alto Unit42 discovered a vulnerability in the Google Cloud Vertex AI software development kit (SDK) for Python, and responsibly disclosed it to Google. Before Google’s fix, the vulnerability would have allowed an attacker operating entirely from their own Google Cloud project to hijack a victim’s model upload and poison it. By exploiting this flaw ...

1
2
3
4
5
6

...

37
>>

LATEST ARTICLES

THE STRATEGIC IMPORTANCE OF DIGITAL SOVEREIGNTY IN 2026
By Alexandre Grellier, CEO, Drooms
Cyber Security Review online – May 2026

A BEGINNER’S ROADMAP: HOW TO START YOUR AI SOC AGENT IMPLEMENTATION
By Kirsten Doyle
Cyber Security Review online – November 2025

THE PEOPLE BEHIND THE PIXELS: WHY CYBERSECURITY IN CRITICAL INDUSTRIES IS MORE HUMAN THAN EVER
Cyber Security Review online – July 2025

HOW TO MAXIMIZE EXCHANGE SERVER UPTIME? - SOME BEST PRACTICES
Cyber Security Review online – June 2025

KEY METRICS TO TRACK WHEN IMPLEMENTING AI IN YOUR SOC
By Josh Breaker-Rolfe
Cyber Security Review online – December 2024

ACHIEVING DATA SECURITY RESILIENCE WITH DSPM TOOLS
By Katrina Thompson
Cyber Security Review online – November 2024

CYBER SECURITY IN CRITICAL INDUSTRIES: CHALLENGES, SOLUTIONS, AND THE ROAD AHEAD
Cyber Security Review online – August 2024

HOW TO ENGAGE YOUR EMPLOYEES IN SECURITY AWARENESS TRAINING
Cyber Security Review online – April 2024

WHY IMMINENT SEC CYBER RULE CHANGES MEANS CYBER SECURITY LEADERSHIP MUST COME FROM THE VERY TOP
By Miguel Clarke, GRC and Cyber Security lead for Armor Defense
Cyber Security Review online – November 2023

WHAT COULD YOU DO IF YOU KNEW HOW EVERY PIECE OF DATA WAS BEING USED?
By Ross Moore, Cyber Security Support Analyst with Passageways
Cyber Security Review online – October 2023

A BRIEF HISTORY OF DATA LOSS PREVENTION
Cyber Security Review online – July 2023

THE 5 ESSENTIAL CYBERSECURITY AWARENESS TRAINING TIPS FOR A MORE SECURE ENVIRONMENT
Cyber Security Review online – June 2023

THE 8-STEP COMPREHENSIVE CHECKLIST FOR APPLICATION SECURITY IN 2023
Cyber Security Review online – April 2023

THE NETWORK SECURITY CHALLENGE:
Improving Visibility to Defend Against Cyberthreats
By Kev Eley, Vice President Sales UK and Europe at LogRhythm

SWEDEN LAUNCHES EUROPE’S MOST ADVANCED HUB FOR AUTOMOTIVE CYBER SECURITY
Research Institute engages ethical hackers and the latest research in cyber technology to combat spiraling threats to connected vehicles