The growing intersection of artificial intelligence (AI) and cybersecurity has opened new avenues for enhancing penetration testing (pentesting) tools. Traditional large language models (LLMs) rely heavily on text-based datasets, which may not fully capture the nuanced and specialized knowledge needed for effective pentesting. This thesis explores the innovative use of video as a knowledge base to develop an LLM for the Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers (CIPHER) . By transforming video content into structured, domain-specific datasets, the research demonstrates how multimedia data can bridge the knowledge gaps found in current AI models. We detail the methodologies for video data collection, transcription, and annotation, leading to the creation of a comprehensive dataset tailored for pentesting tasks. The LLM is trained and evaluated using this dataset, demonstrating its effectiveness in simulating real-world scenarios. This research contributes to the advancement of AI-based penetration testing, offering a unique approach to dataset generation and model training that enhances the capabilities of LLMs in security-related domains.