Integrating an RTSP Client DirectShow Source Filter

Written by

in

How to Build an RTSP Client DirectShow Source Filter Building a custom Real-Time Streaming Protocol (RTSP) client as a DirectShow source filter allows Windows applications to ingest live video and audio streams directly into the DirectShow ecosystem. This architecture enables seamless integration with media players, processing filters, and video management software.

This guide outlines the technical architecture, core components, and step-by-step implementation strategy for developing a robust RTSP source filter. Architecture Overview

A DirectShow RTSP source filter acts as the entry point of a media pipeline. It connects to an external RTSP server, negotiates media tracks, receives network packets, decompresses or parses them, and pushes the media samples downstream.

+————————————————————-+ | DirectShow Source Filter | | | | +——————-+ +—————————-+ | +——————+ | | RTSP Client | —> | Circular Media Buffer | | —> | Output Pin(s) | | | (LiveMedia/FFmpeg) | | (Sinks network jitter) | | | (Push Mode) | | +——————-+ +—————————-+ | +——————+ +————————————————————-+ | v +——————+ | Downstream Filter| | (Decoder/Render) | +——————+ The Push Model (CSource and CSourceStream)

Unlike file source filters that use a pull model (IAsyncReader), live streaming filters must use a push model. The filter proactively pushes incoming network data down the pipeline using an internal streaming thread. Core Components Required

To implement this filter, you must build or extend three primary classes from the DirectShow Base Classes: 1. The Filter Class (CSource) Inherits from CSource.

Implements IBaseFilter and interfaces for configuration (e.g., a custom IRtspConfig to pass the RTSP URL). Manages the filter state transitions (Run, Pause, Stop). Instantiates and manages the output pins. 2. The Output Pin Class (CSourceStream) Inherits from CSourceStream. Manages the streaming thread loop (FillBuffer).

Negotiates media types (GetMediaType, CheckMediaType) with downstream filters (e.g., exposing H.264, H.265, or AAC media types). Handles allocator negotiation to manage sample buffers. 3. The RTSP/RTP Protocol Engine

DirectShow does not natively handle RTSP or RTP/RTCP. You must integrate a third-party network library within your filter framework. Popular choices include:

Live555 Media Server/Client: A highly compliant, open-source C++ library dedicated to RTSP/RTP.

FFmpeg (libavformat/libavcodec): Provides robust RTSP demuxing and handles packet parsing natively. Step-by-Step Implementation Guide Step 1: Handle Filter Configuration and URL Parsing

Your filter needs a mechanism to receive the target RTSP URL (e.g., rtsp://192.168.1.100:554/stream1). Define a custom COM interface or implement standard property pages.

// Example of a custom interface for setting the URL interface IRtspConfig : public IUnknown { STDMETHOD(SetUrl)(const wchar_twszUrl) = 0; STDMETHOD(GetUrl)(wchar_t* wszUrl, DWORD cchUrl) = 0; }; Use code with caution.

When SetUrl is called, validate the URL format and prepare the internal RTSP network client state. Step 2: Establish the RTSP Session

When the filter graph transitions to the Pause or Run state, initialize your network engine to perform the standard RTSP handshake: OPTIONS: Query server capabilities.

DESCRIBE: Retrieve the Session Description Protocol (SDP) file.

SETUP: Negotiate transport protocols (UDP multicast, UDP unicast, or RTP over RTSP/TCP) for each audio and video track. Create an output pin for each successful track.

PLAY: Instruct the server to start sending RTP media streams. Step 3: Parse SDP and Expose Media Types

The SDP file returned during the DESCRIBE phase contains critical media format metadata (such as H.264 SPS/PPS parameters). Your output pin must convert this metadata into a DirectShow AM_MEDIA_TYPE structure. For an H.264 video stream: Major Type: MEDIATYPE_Video Subtype: MEDIASUBTYPE_H264 or FOURCC(‘H264’) Format Type: FORMAT_MPEG2Video or FORMAT_VideoInfo2

Append the SPS/PPS bytes to the format block if the downstream decoder requires out-of-band initialization. Step 4: Implement the Jitter Buffer

Network delivery is unpredictable. Packets may arrive out of order or with variable delay. You must implement a thread-safe, circular Jitter Buffer between your network reading thread and the DirectShow streaming thread.

Network Thread: Receives RTP packets, strips RTP headers, reassembles fragmentation units (e.g., H.264 NAL units), and pushes them into the buffer.

Streaming Thread: Pulls completed frames out of the buffer inside the FillBuffer loop. Step 5: Deliver Data in FillBuffer

The CSourceStream class automatically creates a worker thread that loops continuously while active, calling the FillBuffer method. This is where data is pushed downstream.

HRESULT CRtspOutputPin::FillBuffer(IMediaSample *pSample) { BYTE *pBuffer = nullptr; pSample->GetPointer(&pBuffer); LONG lBufferSize = pSample->GetSize(); // 1. Fetch a complete frame from your Jitter Buffer FrameData frame; if (!m_pJitterBuffer->PopFrame(&frame)) { // Handle starved buffer / timeout return S_FALSE; } // 2. Copy payload data into the DirectShow sample buffer if (frame.size > lBufferSize) return E_FAIL; memcpy(pBuffer, frame.data, frame.size); pSample->SetActualDataLength(frame.size); // 3. Set Timestamps (Convert RTP timestamps to 100-nanosecond units) REFERENCE_TIME rtStart = frame.rtTimestamp; REFERENCE_TIME rtEnd = rtStart + frame.rtDuration; pSample->SetTime(&rtStart, &rtEnd); // 4. Set Sync Point (Keyframe check) pSample->SetSyncPoint(frame.bIsKeyframe); return S_OK; } Use code with caution. Critical Engineering Pitfalls to Avoid

Timestamp Conversion Discrepancies: RTP timestamps use clock frequencies specific to the codec (e.g., 90 kHz for H.264). DirectShow references stream time in 100-nanosecond intervals. Ensure your math accurately handles this conversion to prevent audio-video desynchronization.

Blocking the Streaming Thread: Never perform blocking network socket calls inside FillBuffer. If no data is available in the jitter buffer, sleep briefly or use thread signaling events to prevent freezing the entire DirectShow graph.

Teardown Failures: Ensure that when the graph stops, the RTSP TEARDOWN command is executed cleanly. Unreleased sockets or lingering threads will cause the host application to hang on exit. Conclusion

Building an RTSP Client DirectShow source filter bridges the gap between modern network streaming protocols and the legacy, yet highly performant, Windows media pipeline. By combining standard libraries like Live555 or FFmpeg for network transport with the push-model primitives of CSourceStream, you can achieve low-latency, production-ready live stream ingestion.

If you want to dive deeper into the implementation details, let me know:

Which network library you plan to use (Live555, FFmpeg, or raw WinSock)?

What video/audio codecs (H.264, H.265, AAC) you need to support?

If you need help writing the COM registration code for the filter.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *