Sprechen Sie mit Movies | Auf Information Science

(LLMs) verbessert sich in der Effizienz und können jetzt verschiedene Datenformate verstehen und Möglichkeiten für unzählige Anwendungen in verschiedenen Bereichen bieten. Anfangs konnten LLMs von Natur aus nur Textual content verarbeiten. Das Bildverständnisfunktion wurde durch Kopplung eines LLM mit einem anderen Bildcodierungsmodell integriert. Jedoch, gpt-4o wurde sowohl auf Textual content als auch auf Bilder trainiert und ist das erste echte multimodale LLM, das sowohl Textual content als auch Bilder verstehen kann. Andere Modalitäten wie Audio werden durch andere KI -Modelle, z. B. die Flüstermodelle von OpenAI, in moderne LLMs integriert.

LLMs werden jetzt mehr als Informationsprozessoren verwendet, bei denen Daten in verschiedenen Formaten verarbeitet werden können. Die Integration mehrerer Modalitäten in LLMs öffnet Bereiche zahlreicher Anwendungen in der Bildung, Geschäftund andere Sektoren. Eine solche Anwendung ist die Verarbeitung von Bildungsvideos, Dokumentarfilmen, Webinaren, Präsentationen, Geschäftstreffen, Vorträgen und anderen Inhalten unter Verwendung von LLMs und der Interaktion mit diesen Inhalten natürlicher. Die Audio -Modalität in diesen Movies enthält umfangreiche Informationen, die in einer Reihe von Anwendungen verwendet werden könnten. In Bildungsumgebungen kann es für personalisiertes Lernen verwendet werden, die Zugänglichkeit von Schülern mit besonderen Bedürfnissen, die Erstellung von Studienhilfen, die Unterstützung des Fernunterrichts, die Präsenz eines Lehrers, um Inhalte zu verstehen, und die Bewertung des Wissens der Schüler über ein Thema zu verbessern. In den Geschäftsumgebungen kann es verwendet werden, um neue Mitarbeiter mit Onboarding -Movies, das Extrahieren und Generieren von Kenntnissen aus Aufzeichnungsbesprechungen und Präsentationen, maßgeschneiderten Lernmaterialien aus Produktdemonstrationsvideos und Extrahieren von Erkenntnissen aus aufgezeichneten Branchenkonferenzen verwendet werden, ohne Stunden der Movies anzusehen, um nur einige zu nennen.

In diesem Artikel wird die Entwicklung einer Anwendung zur Interaktion mit Movies auf natürliche Weise erörtert und Lerninhalte von ihnen erstellt. Die Anwendung hat die folgenden Funktionen:

Es führt ein Eingangsvideo entweder über eine URL oder aus einem lokalen Pfad und extrahiert Audio aus dem Video
Transkribiert das Audio mithilfe des hochmodernen Modells von OpenAI gpt-4o-transcribeAnwesend Dies hat eine verbesserte Leistung der Wortfehlerrate (WER) gegenüber vorhandenen Flüstermodellen über mehrere etablierte Benchmarks gezeigt
Erstellt einen Vektorspeicher des Transkripts und entwickelt eine Abruf -Increase -Era (RAG), um ein Gespräch mit dem Video -Transkript zu führen
Beantworten Sie die Fragen der Benutzer in Textual content und Sprache mit verschiedenen Stimmen, die aus der Benutzeroberfläche der Anwendung ausgewählt werden können.
Erstellt Lerninhalte wie:
- Hierarchische Darstellung des Videoinhalts, um den Benutzern schnelle Einblicke in die Hauptkonzepte und unterstützenden Particulars zu geben
- Generieren Sie Quiz, um das passive Video -Ansehen in aktives Lernen zu verwandeln, indem Sie Benutzer dazu fordern, Informationen zu erinnern und im Video präsentiertes Informationen anzuwenden.
- Generiert Flashkarten aus den Videoinhalten, die aktiven Rückruf- und Abstands -Wiederholungslernentechniken unterstützen

Der gesamte Workflow der Anwendung ist in der folgenden Abbildung dargestellt.

Die gesamte Codebasis sowie detaillierte Anweisungen für die Set up und Verwendung finden Sie auf Github.

Hier ist die Struktur des Github -Repositorys. Die Hauptanwendung von Streamlit implementiert die GUI -Schnittstelle und ruft mehrere andere Funktionen aus anderen Funktionen und Helfermodulen auf ((.py Dateien).

GitHub -Code -Struktur (Bild des Autors)

Darüber hinaus können Sie die Codebasis visualisieren, indem Sie das öffnen.Codebasis -Visualisierung”HTML -Datei in einem Browser, der die Strukturen jedes Moduls beschreibt.

Codebasis -Visualisierung (Bild vom Autor)

Lassen Sie uns die schrittweise Entwicklung dieser Anwendung eintauchen. Ich werde nicht den gesamten Code diskutieren, sondern nur seinen Hauptteil. Der gesamte Code im Github -Repository wird angemessen kommentiert.

Videoeingabe und Verarbeitung

Videoeingabe- und Verarbeitungslogik werden in implementiert transcriber.py. Wenn die Anwendung lädt, wird überprüft, ob FFMPEG vorhanden ist (verify_ffmpeg) im Stammverzeichnis der Anwendung. FFMPEG ist zum Herunterladen eines Movies (wenn die Eingabe eine URL ist) und das Extrahieren von Audio aus dem Video erforderlich, mit dem dann ein Transkript erstellt wird.

def verify_ffmpeg():
    """Confirm that FFmpeg is offered and print its location."""
    # Add FFmpeg to PATH
    os.environ('PATH') = FFMPEG_LOCATION + os.pathsep + os.environ('PATH')
    # Examine if FFmpeg binaries exist
    ffmpeg_path = os.path.be a part of(FFMPEG_LOCATION, 'ffmpeg.exe')
    ffprobe_path = os.path.be a part of(FFMPEG_LOCATION, 'ffprobe.exe')
    if not os.path.exists(ffmpeg_path):
        increase FileNotFoundError(f"FFmpeg executable not discovered at: {ffmpeg_path}")
    if not os.path.exists(ffprobe_path):
        increase FileNotFoundError(f"FFprobe executable not discovered at: {ffprobe_path}")
    print(f"FFmpeg discovered at: {ffmpeg_path}")
    print(f"FFprobe discovered at: {ffprobe_path}")
    # Attempt to execute FFmpeg to ensure it really works
    attempt:
        # Add shell=True for Home windows and seize errors correctly
        end result = subprocess.run((ffmpeg_path, '-version'), 
                               stdout=subprocess.PIPE, 
                               stderr=subprocess.PIPE,
                               shell=True,  # This will help with permission points on Home windows
                               test=False)
        if end result.returncode == 0:
            print(f"FFmpeg model: {end result.stdout.decode().splitlines()(0)}")
        else:
            error_msg = end result.stderr.decode()
            print(f"FFmpeg error: {error_msg}")
            # Examine for particular permission errors
            if "Entry is denied" in error_msg:
                print("Permission error detected. Making an attempt different strategy...")
                # Strive an alternate strategy - simply test file existence with out execution
                if os.path.exists(ffmpeg_path) and os.path.exists(ffprobe_path):
                    print("FFmpeg recordsdata exist however execution take a look at failed as a result of permissions.")
                    print("WARNING: The app might fail when making an attempt to course of movies.")
                    # Return paths anyway and hope for the perfect when truly used
                    return ffmpeg_path, ffprobe_path
                
            increase RuntimeError(f"FFmpeg execution failed: {error_msg}")
    besides Exception as e:
        print(f"Error checking FFmpeg: {e}")
        # Fallback choice if verification fails however recordsdata exist
        if os.path.exists(ffmpeg_path) and os.path.exists(ffprobe_path):
            print("WARNING: FFmpeg recordsdata exist however verification failed.")
            print("Trying to proceed anyway, however video processing might fail.")
            return ffmpeg_path, ffprobe_path 
        increase
    return ffmpeg_path, ffprobe_path

Die Videoeingabe erfolgt in Kind einer URL (z. B. YouTube -URL) oder einem lokalen Dateipfad. Der process_video Die Funktion bestimmt den Eingangstyp und leitet ihn entsprechend weiter. Wenn die Eingabe eine URL ist, funktioniert die Helfer get_video_info Und get_video_id Video -Metadaten (Titel, Beschreibung, Dauer) extrahieren, ohne sie herunterzuladen yt_dlp Paket.

#Operate to find out the enter kind and route it appropriately
def process_video(youtube_url, output_dir, api_key, mannequin="gpt-4o-transcribe"):
    """
    Course of a YouTube video to generate a transcript
    Wrapper perform that mixes obtain and transcription
    Args:
        youtube_url: URL of the YouTube video
        output_dir: Listing to save lots of the output
        api_key: OpenAI API key
        mannequin: The mannequin to make use of for transcription (default: gpt-4o-transcribe)
    Returns:
        dict: Dictionary containing transcript and file paths
    """
    # First obtain the audio
    print("Downloading video...")
    audio_path = process_video_download(youtube_url, output_dir)
    
    print("Transcribing video...")
    # Then transcribe the audio
    transcript, transcript_path = process_video_transcribe(audio_path, output_dir, api_key, mannequin=mannequin)
    
    # Return the mixed outcomes
    return {
        'transcript': transcript,
        'transcript_path': transcript_path,
        'audio_path': audio_path
    }

def get_video_info(youtube_url):
    """Get video data with out downloading."""
    # Examine native cache first
    world _video_info_cache
    if youtube_url in _video_info_cache:
        return _video_info_cache(youtube_url)
        
    # Extract information if not cached
    with yt_dlp.YoutubeDL() as ydl:
        information = ydl.extract_info(youtube_url, obtain=False)
        # Cache the end result
        _video_info_cache(youtube_url) = information
        # Additionally cache the video ID individually
        _video_id_cache(youtube_url) = information.get('id', 'video')
        return information

def get_video_id(youtube_url):
    """Get simply the video ID with out re-extracting if already recognized."""
    world _video_id_cache
    if youtube_url in _video_id_cache:
        return _video_id_cache(youtube_url)
    
    # If not in cache, extract from URL instantly if potential
    if "v=" in youtube_url:
        video_id = youtube_url.break up("v=")(1).break up("&")(0)
        _video_id_cache(youtube_url) = video_id
        return video_id
    elif "youtu.be/" in youtube_url:
        video_id = youtube_url.break up("youtu.be/")(1).break up("?")(0)
        _video_id_cache(youtube_url) = video_id
        return video_id
    
    # If we will not extract instantly, fall again to full information extraction
    information = get_video_info(youtube_url)
    video_id = information.get('id', 'video')
    return video_id

Nachdem die Videoeingabe angegeben wurde, ist der Code in app.py Überprüft, ob ein Transkript für das Eingabevideo bereits vorhanden ist (im Fall von URL -Eingabe). Dies erfolgt durch die Aufrufen der folgenden zwei Helferfunktionen von transcriber.py.

def get_transcript_path(youtube_url, output_dir):
    """Get the anticipated transcript path for a given YouTube URL."""
    # Get video ID with caching
    video_id = get_video_id(youtube_url)
    # Return anticipated transcript path
    return os.path.be a part of(output_dir, f"{video_id}_transcript.txt")

def transcript_exists(youtube_url, output_dir):
    """Examine if a transcript already exists for this video."""
    transcript_path = get_transcript_path(youtube_url, output_dir)
    return os.path.exists(transcript_path)

Wenn transcript_exists Gibt den Pfad eines vorhandenen Transkripts zurück. Der nächste Schritt besteht darin, den Vektorspeicher für den Lappen zu erstellen. Wenn kein vorhandenes Transkript gefunden wird, besteht der nächste Schritt darin, Audio aus der URL herunterzuladen und in ein Commonplace -Audio -Format umzuwandeln. Die Funktion process_video_download Laden Sie Audio aus der URL mit der FFMPEG -Bibliothek herunter und konvertiert sie in .mp3 Format. Wenn die Eingabe eine lokale Videodatei ist, ist app.py erreibt es, um es in die Umwandlung zu .mp3 Datei.

def process_video_download(youtube_url, output_dir):
    """
    Obtain audio from a YouTube video
    Args:
        youtube_url: URL of the YouTube video
        output_dir: Listing to save lots of the output
        
    Returns:
        str: Path to the downloaded audio file
    """
    # Create output listing if it does not exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Extract video ID from URL
    video_id = None
    if "v=" in youtube_url:
        video_id = youtube_url.break up("v=")(1).break up("&")(0)
    elif "youtu.be/" in youtube_url:
        video_id = youtube_url.break up("youtu.be/")(1).break up("?")(0)
    else:
        increase ValueError("Couldn't extract video ID from URL")
    # Set output paths
    audio_path = os.path.be a part of(output_dir, f"{video_id}.mp3")
    
    # Configure yt-dlp choices
    ydl_opts = {
        'format': 'bestaudio/greatest',
        'postprocessors': ({
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }),
        'outtmpl': os.path.be a part of(output_dir, f"{video_id}"),
        'quiet': True
    }
    
    # Obtain audio
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.obtain((youtube_url))
    
    # Confirm audio file exists
    if not os.path.exists(audio_path):
        # Strive with an extension that yt-dlp may need used
        potential_paths = (
            os.path.be a part of(output_dir, f"{video_id}.mp3"),
            os.path.be a part of(output_dir, f"{video_id}.m4a"),
            os.path.be a part of(output_dir, f"{video_id}.webm")
        )
        
        for path in potential_paths:
            if os.path.exists(path):
                # Convert to mp3 if it is not already
                if not path.endswith('.mp3'):
                    ffmpeg_path = verify_ffmpeg()(0)
                    output_mp3 = os.path.be a part of(output_dir, f"{video_id}.mp3")
                    subprocess.run((
                        ffmpeg_path, '-i', path, '-c:a', 'libmp3lame', 
                        '-q:a', '2', output_mp3, '-y'
                    ), test=True, capture_output=True)
                    os.take away(path)  # Take away the unique file
                    audio_path = output_mp3
                else:
                    audio_path = path
                break
        else:
            increase FileNotFoundError(f"Couldn't discover downloaded audio file for video {video_id}")
    return audio_path

Audio -Transkription mit OpenAIs gpt-4o-transcribe Modell

Nach dem Extrahieren von Audio und der Konvertierung in ein Commonplace -Audio -Format besteht der nächste Schritt darin, das Audio in Textformat zu transkribieren. Zu diesem Zweck habe ich die neu gestartete Openai verwendet gpt-4o-transcribe Sprach-zu-Textual content-Modell zugänglich über Rede-to-Textual content-API. Dieses Modell hat OpenAIs übertroffen Flüstern Modelle sowohl in Bezug auf die Transkriptionsgenauigkeit als auch in Bezug auf eine robuste Sprachabdeckung.

Die Funktion process_video_transcribe In transcriber.py empfängt die konvertierte Audiodatei und Schnittstellen mit gpt-4o-transcribe Modell mit OpenAs Rede-to-Textual content-API. Der gpt-4o-transcribe Das Modell verfügt derzeit über ein Audiodateilimit von 25 MB und 1500 Dauer. Um diese Einschränkung zu überwinden, habe ich die längeren Dateien in mehrere Stücke aufgeteilt und diese Brocken getrennt transkribieren. Der process_video_transcribe Die Funktion prüft, ob die Eingabedatei die Größen- und/oder Dauergrenze überschreitet. Wenn der eine der Schwellenwert überschritten wird, ruft sie auf split_and_transcribe Funktion, die zuerst die Anzahl der benötigten Stücke berechnet, die sowohl auf Größe als auch auf der Dauer benötigt werden, und das Most dieser beiden als endgültige Anzahl von Stücken für die Transkription nimmt. Anschließend findet es die Begin- und Endzeiten für jeden Stück und extrahiert diese Stücke aus der Audiodatei. Anschließend transkribiert es jeden Chunk mithilfe gpt-4o-transcribe Modell mit OpenAs Sprach-zu-Textual content-API und kombiniert dann Transkripte aller Teile, um das endgültige Transkript zu generieren.

def process_video_transcribe(audio_path, output_dir, api_key, progress_callback=None, mannequin="gpt-4o-transcribe"):
    """
    Transcribe an audio file utilizing OpenAI API, with automated chunking for giant recordsdata
    At all times makes use of the chosen mannequin, with no fallback
    
    Args:
        audio_path: Path to the audio file
        output_dir: Listing to save lots of the transcript
        api_key: OpenAI API key
        progress_callback: Operate to name with progress updates (0-100)
        mannequin: The mannequin to make use of for transcription (default: gpt-4o-transcribe)
        
    Returns:
        tuple: (transcript textual content, transcript path)
    """
    # Extract video ID from audio path
    video_id = os.path.basename(audio_path).break up('.')(0)
    transcript_path = os.path.be a part of(output_dir, f"{video_id}_transcript.txt")
    
    # Setup OpenAI consumer
    consumer = OpenAI(api_key=api_key)
    
    # Replace progress
    if progress_callback:
        progress_callback(10)
    
    # Get file dimension in MB
    file_size_mb = os.path.getsize(audio_path) / (1024 * 1024)
    
    # Common chunking thresholds - apply to each fashions
    max_size_mb = 25  # 25MB chunk dimension for each fashions
    max_duration_seconds = 1500  # 1500 seconds chunk period for each fashions
    
    # Load the audio file to get its period
    attempt:
        audio = AudioSegment.from_file(audio_path)
        duration_seconds = len(audio) / 1000  # pydub makes use of milliseconds
    besides Exception as e:
        print(f"Error loading audio to test period: {e}")
        audio = None
        duration_seconds = 0
    
    # Decide if chunking is required
    needs_chunking = False
    chunking_reason = ()
    
    if file_size_mb > max_size_mb:
        needs_chunking = True
        chunking_reason.append(f"dimension ({file_size_mb:.2f}MB exceeds {max_size_mb}MB)")
    
    if duration_seconds > max_duration_seconds:
        needs_chunking = True
        chunking_reason.append(f"period ({duration_seconds:.2f}s exceeds {max_duration_seconds}s)")
    
    # Log the choice
    if needs_chunking:
        reason_str = " and ".be a part of(chunking_reason)
        print(f"Audio wants chunking as a result of {reason_str}. Utilizing {mannequin} for transcription.")
    else:
        print(f"Audio file is inside limits. Utilizing {mannequin} for direct transcription.")
    
    # Examine if file wants chunking
    if needs_chunking:
        if progress_callback:
            progress_callback(15)
        
        # Break up the audio file into chunks and transcribe every chunk utilizing the chosen mannequin solely
        full_transcript = split_and_transcribe(
            audio_path, consumer, mannequin, progress_callback, 
            max_size_mb, max_duration_seconds, audio
        )
    else:
        # File is sufficiently small, transcribe instantly with the chosen mannequin
        with open(audio_path, "rb") as audio_file:
            if progress_callback:
                progress_callback(30)
                
            transcript_response = consumer.audio.transcriptions.create(
                mannequin=mannequin, 
                file=audio_file
            )
            
            if progress_callback:
                progress_callback(80)
            
            full_transcript = transcript_response.textual content
    
    # Save transcript to file
    with open(transcript_path, "w", encoding="utf-8") as f:
        f.write(full_transcript)
    
    # Replace progress
    if progress_callback:
        progress_callback(100)
    
    return full_transcript, transcript_path

def split_and_transcribe(audio_path, consumer, mannequin, progress_callback=None, 
                         max_size_mb=25, max_duration_seconds=1500, audio=None):
    """
    Break up an audio file into chunks and transcribe every chunk 
    
    Args:
        audio_path: Path to the audio file
        consumer: OpenAI consumer
        mannequin: Mannequin to make use of for transcription (won't fall again to different fashions)
        progress_callback: Operate to name with progress updates
        max_size_mb: Most file dimension in MB
        max_duration_seconds: Most period in seconds
        audio: Pre-loaded AudioSegment (elective)
        
    Returns:
        str: Mixed transcript from all chunks
    """
    # Load the audio file if not offered
    if audio is None:
        audio = AudioSegment.from_file(audio_path)
    
    # Get audio period in seconds
    duration_seconds = len(audio) / 1000
    
    # Calculate the variety of chunks wanted based mostly on each dimension and period
    file_size_mb = os.path.getsize(audio_path) / (1024 * 1024)
    
    chunks_by_size = math.ceil(file_size_mb / (max_size_mb * 0.9))  # Use 90% of max to be protected
    chunks_by_duration = math.ceil(duration_seconds / (max_duration_seconds * 0.95))  # Use 95% of max to be protected
    num_chunks = max(chunks_by_size, chunks_by_duration)
    
    print(f"Splitting audio into {num_chunks} chunks based mostly on dimension ({chunks_by_size}) and period ({chunks_by_duration})")
    
    # Calculate chunk period in milliseconds
    chunk_length_ms = len(audio) // num_chunks
    
    # Create temp listing for chunks if it does not exist
    temp_dir = os.path.be a part of(os.path.dirname(audio_path), "temp_chunks")
    os.makedirs(temp_dir, exist_ok=True)
    
    # Break up the audio into chunks and transcribe every chunk
    transcripts = ()
    
    for i in vary(num_chunks):
        if progress_callback:
            # Replace progress: 20% for splitting, 60% for transcribing
            progress_percent = 20 + int((i / num_chunks) * 60)
            progress_callback(progress_percent)
        
        # Calculate begin and finish instances for this chunk
        start_ms = i * chunk_length_ms
        end_ms = min((i + 1) * chunk_length_ms, len(audio))
        
        # Extract the chunk
        chunk = audio(start_ms:end_ms)
        
        # Save the chunk to a short lived file
        chunk_path = os.path.be a part of(temp_dir, f"chunk_{i}.mp3")
        chunk.export(chunk_path, format="mp3")
        
        # Log chunk data
        chunk_size_mb = os.path.getsize(chunk_path) / (1024 * 1024)
        chunk_duration = len(chunk) / 1000
        print(f"Chunk {i+1}/{num_chunks}: {chunk_size_mb:.2f}MB, {chunk_duration:.2f}s")
        
        # Transcribe the chunk 
        attempt:
            with open(chunk_path, "rb") as chunk_file:
                transcript_response = consumer.audio.transcriptions.create(
                    mannequin=mannequin,
                    file=chunk_file
                )
                
                # Add to our checklist of transcripts
                transcripts.append(transcript_response.textual content)
        besides Exception as e:
            print(f"Error transcribing chunk {i+1} with {mannequin}: {e}")
            # Add a placeholder for the failed chunk
            transcripts.append(f"(Transcription failed for phase {i+1})")
        
        # Clear up the momentary chunk file
        os.take away(chunk_path)
    
    # Clear up the momentary listing
    attempt:
        os.rmdir(temp_dir)
    besides:
        print(f"Be aware: Couldn't take away momentary listing {temp_dir}")
    
    # Mix all transcripts with correct spacing
    full_transcript = " ".be a part of(transcripts)
    
    return full_transcript

Der folgende Screenshot der Streamlit -App zeigt den Videoverarbeitung und den Transkriptieren des Workflows für einen meiner Webinare. “Integration von LLMs in das Geschäft„ Erhältlich auf meinem YouTube -Kanal.

Snapshot der Streamlit -App zeigt den Prozess des Extrahierens von Audio und Transkribierung (Bild des Autors)

Abrufenvergrößerungsgeneration (RAG) für interaktive Gespräche

Nach dem Generieren des Video-Transkripts entwickelt die Anwendung einen Lappen, um sowohl Textual content- als auch sprachbasierte Interaktionen zu erleichtern. Die Konversations -Intelligenz wird durch implementiert VideoRAG Klasse in rag_system.py die initialisiert die Größe und Überlappung, Openai -Einbettungen, ChatOpenAI Instanz, um Antworten mit zu generieren gpt-4o Modell und ConversationBufferMemory Chat -Historie für kontextbezogene Kontinuität aufrechtzuerhalten.

Der create_vector_store Die Methode spaltet die Dokumente in Stücke und erstellt einen Vektorspeicher mithilfe der FAISS -Vektor -Datenbank. Der handle_question_submission Methode verarbeitet Textfragen und findet jede neue Frage und ihre Antwort auf den Gesprächsgeschichte an. Die Funktion „Handle_Speech_input“ implementiert die vollständige Pipeline mit Voice-to-Textual content-zu-Voice. Es wird zunächst die Frage Audio aufgezeichnet, die Frage transkribiert, die Abfrage durch das Lappensystem verarbeitet und die Sprache für die Antwort synthetisiert.

class VideoRAG:
    def __init__(self, api_key=None, chunk_size=1000, chunk_overlap=200):
        """Initialize the RAG system with OpenAI API key."""
        # Use offered API key or attempt to get from surroundings
        self.api_key = api_key if api_key else st.secrets and techniques("OPENAI_API_KEY")
        if not self.api_key:
            increase ValueError("OpenAI API secret's required both as parameter or surroundings variable")
            
        self.embeddings = OpenAIEmbeddings(openai_api_key=self.api_key)
        self.llm = ChatOpenAI(
            openai_api_key=self.api_key,
            mannequin="gpt-4o",
            temperature=0
        )
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.vector_store = None
        self.chain = None
        self.reminiscence = ConversationBufferMemory(
            memory_key="chat_history",
            return_messages=True
        )
    
    def create_vector_store(self, transcript):
        """Create a vector retailer from the transcript."""
        # Break up the textual content into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=self.chunk_size,
            chunk_overlap=self.chunk_overlap,
            separators=("nn", "n", " ", "")
        )
        chunks = text_splitter.split_text(transcript)
        
        # Create vector retailer
        self.vector_store = FAISS.from_texts(chunks, self.embeddings)
        
        # Create immediate template for the RAG system
        system_template = """You're a specialised AI assistant that solutions questions on a selected video. 
        
        You might have entry to snippets from the video transcript, and your position is to supply correct data ONLY based mostly on these snippets.
        
        Pointers:
        1. Solely reply questions based mostly on the knowledge offered within the context from the video transcript, in any other case say that "I do not know. The video does not cowl that data."
        2. The query might ask you to summarize the video or inform what the video is about. In that case, current a abstract of the context. 
        3. Do not make up data or use information from outdoors the offered context
        4. Preserve your solutions concise and instantly associated to the query
        5. If requested about your capabilities or id, clarify that you simply're an AI assistant that makes a speciality of answering questions on this particular video
        
        Context from the video transcript:
        {context}
        
        Chat Historical past:
        {chat_history}
        """
        user_template = "{query}"
        
        # Create the messages for the chat immediate
        messages = (
            SystemMessagePromptTemplate.from_template(system_template),
            HumanMessagePromptTemplate.from_template(user_template)
        )
        
        # Create the chat immediate
        qa_prompt = ChatPromptTemplate.from_messages(messages)
        
        # Initialize the RAG chain with the customized immediate
        self.chain = ConversationalRetrievalChain.from_llm(
            llm=self.llm,
            retriever=self.vector_store.as_retriever(
                search_kwargs={"okay": 5}
            ),
            reminiscence=self.reminiscence,
            combine_docs_chain_kwargs={"immediate": qa_prompt},
            verbose=True
        )
        
        return len(chunks)
    
    def set_chat_history(self, chat_history):
        """Set chat historical past from exterior session state."""
        if not self.reminiscence:
            return
            
        # Clear present reminiscence
        self.reminiscence.clear()
        
        # Convert commonplace chat historical past format to LangChain message format
        for message in chat_history:
            if message("position") == "consumer":
                self.reminiscence.chat_memory.add_user_message(message("content material"))
            elif message("position") == "assistant":
                self.reminiscence.chat_memory.add_ai_message(message("content material"))
    
    def ask(self, query, chat_history=None):
        """Ask a query to the RAG system."""
        if not self.chain:
            increase ValueError("Vector retailer not initialized. Name create_vector_store first.")
        
        # If chat historical past is offered, replace the reminiscence
        if chat_history:
            self.set_chat_history(chat_history)
        
        # Get response
        response = self.chain.invoke({"query": query})
        return response("reply")

Sehen Sie sich den folgenden Schnappschuss der Streamlit -App an und zeigen die interaktive Konversationsschnittstelle mit dem Video an.

Snapshot zeigt Konversationsschnittstellen und interaktive Lerninhalte (Bild des Autors)

Der folgende Snapshot zeigt eine Konversation mit dem Video mit Spracheingabe und Textual content+Sprachausgabe.

Characteristic -Generierung

Die Anwendung generiert drei Merkmale: hierarchische Zusammenfassung, Quiz und Karteikarten. Bitte beachten Sie ihre jeweiligen kommentierten Codes in der Github Repo.

Der SummaryGenerator Klasse in abstract.py Bietet eine strukturierte Inhaltsübersicht, indem eine hierarchische Darstellung der Videoinhalte erstellt wird, um den Benutzern schnelle Einblicke in die Hauptkonzepte und unterstützenden Particulars zu geben. Das System ruft wichtige Kontextsegmente aus dem Transkript mit RAG ab. Verwenden einer Eingabeaufforderung (siehe generate_summary), es erzeugt eine hierarchische Zusammenfassung mit drei Ebenen: Hauptpunkte, Unterpunkte und zusätzliche Particulars. Der create_summary_popup_html Die Methode transformiert die generierte Zusammenfassung mithilfe von CSS und JavaScript in eine interaktive visuelle Darstellung.

# abstract.py
class SummaryGenerator:
    def __init__(self):
        move
    
    def generate_summary(self, rag_system, api_key, mannequin="gpt-4o", temperature=0.2):
        """
        Generate a hierarchical bullet-point abstract from the video transcript
        
        Args:
            rag_system: The RAG system with vector retailer
            api_key: OpenAI API key
            mannequin: Mannequin to make use of for abstract technology
            temperature: Creativity degree (0.0-1.0)
            
        Returns:
            str: Hierarchical bullet-point abstract textual content
        """
        if not rag_system:
            st.error("Please transcribe the video first earlier than making a abstract!")
            return ""
        
        with st.spinner("Producing hierarchical abstract..."):
            # Create LLM for abstract technology
            summary_llm = ChatOpenAI(
                openai_api_key=api_key,
                mannequin=mannequin,
                temperature=temperature  # Decrease temperature for extra factual summaries
            )
            
            # Use the RAG system to get related context
            attempt:
                # Get broader context since we're summarizing the entire video
                relevant_docs = rag_system.vector_store.similarity_search(
                    "summarize the details of this video", okay=10
                )
                context = "nn".be a part of((doc.page_content for doc in relevant_docs))
                
                immediate = """Based mostly on the video transcript, create a hierarchical bullet-point abstract of the content material.
                Construction your abstract with precisely these ranges:
                
                • Details (use • or * in the beginning of the road for these top-level factors)
                  - Sub-points (use - in the beginning of the road for these second-level particulars)
                    * Further particulars (use areas adopted by * for third-level factors)
                
                For instance:
                • First important level
                  - Vital element concerning the first level
                  - One other essential element
                    * A selected instance
                    * One other particular instance
                • Second important level
                  - Element about second level
                
                Be in keeping with the precise formatting proven above. Every bullet degree should begin with the precise character proven (• or *, -, and areas+*).
                Create 3-5 details with 2-4 sub-points every, and add third-level particulars the place applicable.
                Give attention to an important data from the video.
                """
                
                # Use the LLM with context to generate the abstract
                messages = (
                    {"position": "system", "content material": f"You're given the next context from a video transcript:nn{context}nnUse this context to create a hierarchical abstract in keeping with the directions."},
                    {"position": "consumer", "content material": immediate}
                )
                
                response = summary_llm.invoke(messages)
                return response.content material
            besides Exception as e:
                # Fallback to the common RAG system if there's an error
                st.warning(f"Utilizing commonplace abstract technology as a result of error: {str(e)}")
                return rag_system.ask(immediate)
    
    def create_summary_popup_html(self, summary_content):
        """
        Create HTML for the abstract popup with correctly formatted hierarchical bullets
        
        Args:
            summary_content: Uncooked abstract textual content with markdown bullet formatting
            
        Returns:
            str: HTML for the popup with correctly formatted bullets
        """
        # As an alternative of counting on markdown conversion, let's manually parse and format the bullet factors
        strains = summary_content.strip().break up('n')
        formatted_html = ()
        
        in_list = False
        list_level = 0
        
        for line in strains:
            line = line.strip()
            
            # Skip empty strains
            if not line:
                proceed
                
            # Detect if it is a markdown header
            if line.startswith('# '):
                if in_list:
                    # Shut any open lists
                    for _ in vary(list_level):
                        formatted_html.append('</ul>')
                    in_list = False
                    list_level = 0
                formatted_html.append(f'<h1>{line(2:)}</h1>')
                proceed
                
            # Examine line for bullet level markers
            if line.startswith('• ') or line.startswith('* '):
                # Prime degree bullet
                content material = line(2:).strip()
                
                if not in_list:
                    # Begin a brand new checklist
                    formatted_html.append('<ul class="top-level">')
                    in_list = True
                    list_level = 1
                elif list_level > 1:
                    # Shut nested lists to get again to high degree
                    for _ in vary(list_level - 1):
                        formatted_html.append('</ul></li>')
                    list_level = 1
                else:
                    # Shut earlier checklist merchandise if wanted
                    if formatted_html and never formatted_html(-1).endswith('</ul></li>') and in_list:
                        formatted_html.append('</li>')
                        
                formatted_html.append(f'<li class="top-level-item">{content material}')
                
            elif line.startswith('- '):
                # Second degree bullet
                content material = line(2:).strip()
                
                if not in_list:
                    # Begin new lists
                    formatted_html.append('<ul class="top-level"><li class="top-level-item">Second degree objects')
                    formatted_html.append('<ul class="second-level">')
                    in_list = True
                    list_level = 2
                elif list_level == 1:
                    # Add a nested checklist
                    formatted_html.append('<ul class="second-level">')
                    list_level = 2
                elif list_level > 2:
                    # Shut deeper nested lists to get to second degree
                    for _ in vary(list_level - 2):
                        formatted_html.append('</ul></li>')
                    list_level = 2
                else:
                    # Shut earlier checklist merchandise if wanted
                    if formatted_html and never formatted_html(-1).endswith('</ul></li>') and list_level == 2:
                        formatted_html.append('</li>')
                        
                formatted_html.append(f'<li class="second-level-item">{content material}')
                
            elif line.startswith('  * ') or line.startswith('    * '):
                # Third degree bullet
                content material = line.strip()(2:).strip()
                
                if not in_list:
                    # Begin new lists (all ranges)
                    formatted_html.append('<ul class="top-level"><li class="top-level-item">Prime degree')
                    formatted_html.append('<ul class="second-level"><li class="second-level-item">Second degree')
                    formatted_html.append('<ul class="third-level">')
                    in_list = True
                    list_level = 3
                elif list_level == 2:
                    # Add a nested checklist
                    formatted_html.append('<ul class="third-level">')
                    list_level = 3
                elif list_level < 3:
                    # We missed a degree, alter
                    formatted_html.append('<li>Lacking degree</li>')
                    formatted_html.append('<ul class="third-level">')
                    list_level = 3
                else:
                    # Shut earlier checklist merchandise if wanted
                    if formatted_html and never formatted_html(-1).endswith('</ul></li>') and list_level == 3:
                        formatted_html.append('</li>')
                        
                formatted_html.append(f'<li class="third-level-item">{content material}')
            else:
                # Common paragraph
                if in_list:
                    # Shut any open lists
                    for _ in vary(list_level):
                        formatted_html.append('</ul>')
                        if list_level > 1:
                            formatted_html.append('</li>')
                    in_list = False
                    list_level = 0
                formatted_html.append(f'<p>{line}</p>')
        
        # Shut any open lists
        if in_list:
            # Shut ultimate merchandise
            formatted_html.append('</li>')
            # Shut any open lists
            for _ in vary(list_level):
                if list_level > 1:
                    formatted_html.append('</ul></li>')
                else:
                    formatted_html.append('</ul>')
        
        summary_html = 'n'.be a part of(formatted_html)
        
        html = f"""
        <div id="summary-popup" class="popup-overlay">
            <div class="popup-content">
                <div class="popup-header">
                    <h2>Hierarchical Abstract</h2>
                    <button onclick="closeSummaryPopup()" class="close-button">×</button>
                </div>
                <div class="popup-body">
                    {summary_html}
                </div>
            </div>
        </div>
        
        <model>
        .popup-overlay {{
            place: fastened;
            high: 0;
            left: 0;
            width: 100%;
            peak: 100%;
            background-color: rgba(0, 0, 0, 0.5);
            z-index: 1000;
            show: flex;
            justify-content: middle;
            align-items: middle;
        }}
        
        .popup-content {{
            background-color: white;
            padding: 20px;
            border-radius: 10px;
            width: 80%;
            max-width: 800px;
            max-height: 80vh;
            overflow-y: auto;
            box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
        }}
        
        .popup-header {{
            show: flex;
            justify-content: space-between;
            align-items: middle;
            border-bottom: 1px strong #ddd;
            padding-bottom: 10px;
            margin-bottom: 15px;
        }}
        
        .close-button {{
            background: none;
            border: none;
            font-size: 24px;
            cursor: pointer;
            shade: #555;
        }}
        
        .close-button:hover {{
            shade: #000;
        }}
        
        /* Model for hierarchical bullets */
        .popup-body ul {{
            padding-left: 20px;
            margin-bottom: 5px;
        }}
        
        .popup-body ul.top-level {{
            list-style-type: disc;
        }}
        
        .popup-body ul.second-level {{
            list-style-type: circle;
            margin-top: 5px;
        }}
        
        .popup-body ul.third-level {{
            list-style-type: sq.;
            margin-top: 3px;
        }}
        
        .popup-body li.top-level-item {{
            margin-bottom: 12px;
            font-weight: daring;
        }}
        
        .popup-body li.second-level-item {{
            margin-bottom: 8px;
            font-weight: regular;
        }}
        
        .popup-body li.third-level-item {{
            margin-bottom: 5px;
            font-weight: regular;
            font-size: 0.95em;
        }}
        
        .popup-body p {{
            margin-bottom: 10px;
        }}
        </model>
        
        <script>
        perform closeSummaryPopup() {{
            doc.getElementById('summary-popup').model.show = 'none';
            
            // Ship message to Streamlit
            window.mum or dad.postMessage({{
                kind: "streamlit:setComponentValue",
                worth: true
            }}, "*");
        }}
        </script>
        """
        return html

Heirarchische Zusammenfassung (Bild des Autors)

Speak-to-Video-App generiert Quiz aus dem Video über die QuizGenerator Klasse in quiz.py. Der Quizgenerator erstellt A number of-Selection-Fragen, die auf bestimmte Fakten und Konzepte im Video abzielen. Im Gegensatz zu Rag, bei dem ich eine Null -Temperatur verwende, habe ich die LLM -Temperatur auf 0,4 erhöht, um eine gewisse Kreativität bei der Quizgenerierung zu fördern. Eine strukturierte Eingabeaufforderung führt den Prozess der Quizgenerierung. Der parse_quiz_response Methoden extrahiert und validiert die generierten Quizelemente, um sicherzustellen, dass jede Frage alle erforderlichen Komponenten enthält. Um zu verhindern, dass die Benutzer das Muster erkennen und ein echtes Verständnis fördern, mischt der Quizgenerator die Antwortoptionen. Fragen werden nacheinander Fragen vorgestellt, gefolgt von sofortigem Suggestions zu jeder Antwort. Nachdem alle Fragen gestellt wurden, die calculate_quiz_results Die Methode bewertet Benutzerantworten und der Benutzer wird mit einer Gesamtpunktzahl, einer visuellen Aufschlüsselung von Richtigen gegenüber falschen Antworten und Suggestions auf die Leistungsebene angezeigt. Auf diese Weise verwandelt die Quiz -Era -Funktionalität das passive Video -Ansehen in aktives Lernen, indem sie die Benutzer dazu fordern, Informationen zu erinnern und im Video angegeben.

# quiz.py
class QuizGenerator:
    def __init__(self):
        move
    
    def generate_quiz(self, rag_system, api_key, transcript=None, mannequin="gpt-4o", temperature=0.4):
        """
        Generate quiz questions based mostly on the video transcript
        
        Args:
            rag_system: The RAG system with vector store2
            api_key: OpenAI API key
            transcript: The total transcript textual content (elective)
            mannequin: Mannequin to make use of for query technology
            temperature: Creativity degree (0.0-1.0)
            
        Returns:
            checklist: Record of query objects
        """
        if not rag_system:
            st.error("Please transcribe the video first earlier than making a quiz!")
            return ()
        
        # Create a short lived LLM with barely greater temperature for extra artistic questions
        creative_llm = ChatOpenAI(
            openai_api_key=api_key,
            mannequin=mannequin,
            temperature=temperature
        )

        num_questions = 10
        
        # Immediate to generate quiz
        immediate = f"""Based mostly on the video transcript, generate {num_questions} multiple-choice questions to check understanding of the content material.
        For every query:
        1. The query must be particular to data talked about within the video
        2. Embrace 4 choices (A, B, C, D)
        3. Clearly point out the right reply
        
        Format your response precisely as follows for every query:
        QUESTION: (query textual content)
        A: (choice A)
        B: (choice B)
        C: (choice C)
        D: (choice D)
        CORRECT: (letter of right reply)
       
        Be sure all questions are based mostly on info from the video."""
        
        attempt:
            if transcript:
                # If we've the total transcript, use it
                messages = (
                    {"position": "system", "content material": f"You're given the next transcript from a video:nn{transcript}nnUse this transcript to create quiz questions in keeping with the directions."},
                    {"position": "consumer", "content material": immediate}
                )
                
                response = creative_llm.invoke(messages)
                response_text = response.content material
            else:
                # Fallback to RAG strategy if no transcript is offered
                relevant_docs = rag_system.vector_store.similarity_search(
                    "what are the principle matters lined on this video?", okay=5
                )
                context = "nn".be a part of((doc.page_content for doc in relevant_docs))
                
                # Use the artistic LLM with context to generate questions
                messages = (
                    {"position": "system", "content material": f"You're given the next context from a video transcript:nn{context}nnUse this context to create quiz questions in keeping with the directions."},
                    {"position": "consumer", "content material": immediate}
                )
                
                response = creative_llm.invoke(messages)
                response_text = response.content material
        besides Exception as e:
            # Fallback to the common RAG system if there's an error
            st.warning(f"Utilizing commonplace query technology as a result of error: {str(e)}")
            response_text = rag_system.ask(immediate)
        
        return self.parse_quiz_response(response_text)

    # The remainder of the category stays unchanged
    def parse_quiz_response(self, response_text):
        """
        Parse the LLM response to extract questions, choices, and proper solutions
        
        Args:
            response_text: Uncooked textual content response from LLM
            
        Returns:
            checklist: Record of parsed query objects
        """
        quiz_questions = ()
        current_question = {}
        
        for line in response_text.strip().break up('n'):
            line = line.strip()
            if line.startswith('QUESTION:'):
                if current_question and 'query' in current_question and 'choices' in current_question and 'right' in current_question:
                    quiz_questions.append(current_question)
                current_question = {
                    'query': line(len('QUESTION:'):).strip(),
                    'choices': (),
                    'right': None
                }
            elif line.startswith(('A:', 'B:', 'C:', 'D:')):
                option_letter = line(0)
                option_text = line(2:).strip()
                current_question.setdefault('choices', ()).append((option_letter, option_text))
            elif line.startswith('CORRECT:'):
                current_question('right') = line(len('CORRECT:'):).strip()
        
        # Add the final query
        if current_question and 'query' in current_question and 'choices' in current_question and 'right' in current_question:
            quiz_questions.append(current_question)
        
        # Randomize choices for every query
        randomized_questions = ()
        for q in quiz_questions:
            # Get the unique right reply
            correct_letter = q('right')
            correct_option = None
            
            # Discover the right choice textual content
            for letter, textual content in q('choices'):
                if letter == correct_letter:
                    correct_option = textual content
                    break
            
            if correct_option is None:
                # If we will not discover the right reply, hold the query as is
                randomized_questions.append(q)
                proceed
                
            # Create a listing of choices texts and shuffle them
            option_texts = (textual content for _, textual content in q('choices'))
            
            # Create a duplicate of the unique letters
            option_letters = (letter for letter, _ in q('choices'))
            
            # Create a listing of (letter, textual content) pairs
            options_pairs = checklist(zip(option_letters, option_texts))
            
            # Shuffle the pairs
            random.shuffle(options_pairs)
            
            # Discover the brand new place of the right reply
            new_correct_letter = None
            for letter, textual content in options_pairs:
                if textual content == correct_option:
                    new_correct_letter = letter
                    break
            
            # Create a brand new query with randomized choices
            new_q = {
                'query': q('query'),
                'choices': options_pairs,
                'right': new_correct_letter
            }
            
            randomized_questions.append(new_q)
        
        return randomized_questions
    
    def calculate_quiz_results(self, questions, user_answers):
        """
        Calculate quiz outcomes based mostly on consumer solutions
        
        Args:
            questions: Record of query objects
            user_answers: Dictionary of consumer solutions keyed by question_key
            
        Returns:
            tuple: (outcomes dict, right depend)
        """
        correct_count = 0
        outcomes = {}
        
        for i, query in enumerate(questions):
            question_key = f"quiz_q_{i}"
            user_answer = user_answers.get(question_key)
            correct_answer = query('right')
            
            # Solely depend as right if consumer chosen a solution and it matches
            is_correct = user_answer will not be None and user_answer == correct_answer
            if is_correct:
                correct_count += 1
            
            outcomes(question_key) = {
                'user_answer': user_answer,
                'correct_answer': correct_answer,
                'is_correct': is_correct
            }
        
        return outcomes, correct_count

Speak-to-Movies generiert auch Karteikarten aus dem Videoinhalt, die den aktiven Rückruf und die Wiederholungstechniken der Wiederholung unterstützen. Dies geschieht durch die FlashcardGenerator Klasse in flashcards.pyAnwesend Dies schafft eine Mischung aus verschiedenen Karteikarten, die sich auf wichtige Termsdefinitionen, konzeptionelle Fragen, Aussagen von Anweisungen und wahre/falsche Fragen mit Erklärungen konzentrieren. Eine Eingabeaufforderung führt das LLM, um Flitzkarten in einem strukturierten JSON -Format auszugeben, wobei jede Karte unterschiedliche „Vorder-“ und „Rücken“ enthält. Der shuffle_flashcards Erstellt eine randomisierte Präsentation, und jede Flitzkarte wird validiert, um sicherzustellen, dass sie sowohl vordere als auch hintere Komponenten enthält, bevor sie dem Benutzer präsentiert werden. Die Antwort auf jede FlashCard ist zunächst versteckt. Es wird im Eingang des Benutzers mithilfe einer klassischen Flashcard -Enthüllungsfunktionalität enthüllt. Benutzer können für mehr Übung einen neuen Satz von Karteikarten generieren. Die Flascard- und Quiz -Systeme sind miteinander verbunden, sodass Benutzer nach Bedarf zwischen ihnen wechseln können.

# flashcards.py
class FlashcardGenerator:
    """Class to generate flashcards from video content material utilizing the RAG system."""
    
    def __init__(self):
        """Initialize the flashcard generator."""
        move
    
    def generate_flashcards(self, rag_system, api_key, transcript=None, num_cards=10, mannequin="gpt-4o") -> Record(Dict(str, str)):
        """
        Generate flashcards based mostly on the video content material.
        
        Args:
            rag_system: The initialized RAG system with video content material
            api_key: OpenAI API key
            transcript: The total transcript textual content (elective)
            num_cards: Variety of flashcards to generate (default: 10)
            mannequin: The OpenAI mannequin to make use of
            
        Returns:
            Record of flashcard dictionaries with 'entrance' and 'again' keys
        """
        # Import right here to keep away from round imports
        from langchain_openai import ChatOpenAI
        
        # Initialize language mannequin
        llm = ChatOpenAI(
            openai_api_key=api_key,
            mannequin=mannequin,
            temperature=0.4
        )
        
        # Create the immediate for flashcard technology
        immediate = f"""
        Create {num_cards} academic flashcards based mostly on the video content material.
        
        Every flashcard ought to have:
        1. A entrance facet with a query, time period, or idea
        2. A again facet with the reply, definition, or rationalization
        
        Give attention to an important and academic content material from the video. 
        Create a mixture of various kinds of flashcards:
        - Key time period definitions
        - Conceptual questions
        - Fill-in-the-blank statements
        - True/False questions with explanations
        
        Format your response as a JSON array of objects with 'entrance' and 'again' properties.
        Instance:
        (
            {{"entrance": "What's photosynthesis?", "again": "The method by which vegetation convert gentle vitality into chemical vitality."}},
            {{"entrance": "The three branches of presidency are: Govt, Legislative, and _____", "again": "Judicial"}}
        )
        
        Be sure your output is legitimate JSON format with precisely {num_cards} flashcards.
        """
        
        attempt:
            # Decide the context to make use of
            if transcript:
                # Use the total transcript if offered
                # Create messages for the language mannequin
                messages = (
                    {"position": "system", "content material": f"You're an academic content material creator specializing in creating efficient flashcards. Use the next transcript from a video to create academic flashcards:nn{transcript}"},
                    {"position": "consumer", "content material": immediate}
                )
            else:
                # Fallback to RAG system if no transcript is offered
                relevant_docs = rag_system.vector_store.similarity_search(
                    "key factors and academic ideas within the video", okay=15
                )
                context = "nn".be a part of((doc.page_content for doc in relevant_docs))
                
                # Create messages for the language mannequin
                messages = (
                    {"position": "system", "content material": f"You're an academic content material creator specializing in creating efficient flashcards. Use the next context from a video to create academic flashcards:nn{context}"},
                    {"position": "consumer", "content material": immediate}
                )
            
            # Generate flashcards
            response = llm.invoke(messages)
            content material = response.content material
            
            # Extract JSON content material in case there's textual content round it
            json_start = content material.discover('(')
            json_end = content material.rfind(')') + 1
            
            if json_start >= 0 and json_end > json_start:
                json_content = content material(json_start:json_end)
                flashcards = json.masses(json_content)
            else:
                # Fallback in case of improper JSON formatting
                increase ValueError("Didn't extract legitimate JSON from response")
            
            # Confirm we've the anticipated variety of playing cards (or alter as wanted)
            actual_cards = min(len(flashcards), num_cards)
            flashcards = flashcards(:actual_cards)
            
            # Validate every flashcard has required fields
            validated_cards = ()
            for card in flashcards:
                if 'entrance' in card and 'again' in card:
                    validated_cards.append({
                        'entrance': card('entrance'),
                        'again': card('again')
                    })
            
            return validated_cards
        
        besides Exception as e:
            # Deal with errors gracefully
            print(f"Error producing flashcards: {str(e)}")
            # Return a number of primary flashcards in case of error
            return (
                {"entrance": "Error producing flashcards", "again": f"Please attempt once more. Error: {str(e)}"},
                {"entrance": "Tip", "again": "Strive regenerating flashcards or utilizing a distinct video"}
            )
    
    def shuffle_flashcards(self, flashcards: Record(Dict(str, str))) -> Record(Dict(str, str)):
        """Shuffle the order of flashcards"""
        shuffled = flashcards.copy()
        random.shuffle(shuffled)
        return shuffled

Mögliche Erweiterungen und Verbesserungen

Diese Anwendung kann auf verschiedene Weise erweitert und verbessert werden. Zum Beispiel:

Die Integration visueller Funktionen in Video (z. B. Keyframes) kann mit Audio untersucht werden, um aussagekräftigere Informationen zu extrahieren.
Teambasierte Lernerfahrungen können aktiviert werden, wenn Workplace-Kollegen oder Klassenkameraden Notizen, Quiz-Ergebnisse und Zusammenfassungen teilen können.
Erstellen von navigierbaren Transkripten, mit denen Benutzer auf bestimmte Abschnitte klicken können, um zu diesem Punkt im Video zu springen
Erstellen Sie schrittweise Aktionspläne für die Implementierung von Konzepten aus dem Video in realen Geschäftsumgebungen
Ändern Sie die Eingabeaufforderung, die Antworten auszuarbeiten und schwierige Erklärungen für schwierige Konzepte zu geben.
Erzeugen Sie Fragen, die metakognitive Fähigkeiten bei den Lernenden anregen, indem sie anregen, über ihren Denkprozess und Lernstrategien nachzudenken und gleichzeitig mit Videoinhalten in Kontakt zu treten.

Das sind alles Leute! Wenn Ihnen der Artikel gefallen hat, folgen Sie mir bitte weiter Medium Und LinkedIn.

Sprechen Sie mit Movies | Auf Information Science

Von admin

Schreibe einen Kommentar Antworten abbrechen

Versäumt

Die Zukunft von Datentechnik und Datenpipelines in der KI -Ära

KI -Instruments fordern soziale Medien für Anwälte um

Wie benutze ich MCP mit Cursor AI?

So messen Sie reale Modellgenauigkeit, wenn Etiketten laut sind

About

Categories

Tags

Recent Post

Die Zukunft von Datentechnik und Datenpipelines in der KI -Ära

KI -Instruments fordern soziale Medien für Anwälte um

Von admin

Ähnlicher Beitrag

Schreibe einen Kommentar Antworten abbrechen

Versäumt