When the resolution of an analogue CCTV camera was asked to be defined, the specifications, traditionally, were given as a percentage of the observed image- this varied of course as different objectives require various percentages.
An example of percentages when using CCTV equipment would be when detecting a space, an object or person within a scene – such as an average height person, within a scene may require 50% of the scene being filled to achieve recognition and 120% for identification.
When given the task of designing video surveillance, the most challenging aspect is managing the owner’s expectations. This is due to the fictional creativity displayed on shows like CSI and NCIS. Through these routes, it is expected that CCTV surveillance systems can be deployed anywhere – providing crystal clear images, no matter what the environmental factors or other situational conditions.
We watch the news commentator on our home television with incredible detail, and football players performing on the football field with close-up, slow motion and even stop action, crystal clarity - we have become so used to the high definition screens now that we recoil from analogue viewing. Furthermore, we see the CSI lab scientist take a distorted reflection of an assailant off the chrome fender of a car and “enhance it” such that it becomes a courtroom worthy image of the perpetrator. This is then compared to real-life - why shouldn’t security video cameras provide the same performance? While most realise that the sitcom and movie makers take creative license and exaggerate the realities, daily exposure to this media creates and embeds unrealistic expectations.
So, we need to move from unrealistic perceptions to realistic abilities...
Performance in varied conditions
When considering which CCTV camera, and what it is capable of, first we must consider the light conditions. The capabilities of cameras differ vastly - one may perform quite well in low light, where another may struggle. It’s a matter of physics. Specifically, when larger pixel counts are spread over a single ?-inch sensor, each pixel receives less of the incoming light. When you divide a ?-inch sensor into 307,200 pixels (640 x 480), you have much larger pixels on the chips* than when you divide that same size sensor into 5,038,848 pixels (2592 x 1944). Therefore, the larger the pixel surface area, the more light it receives from the lens.
Larger pixel counts cause inherent light degradation, however better light sensors have compensated for this – but limiting cameras to only 1.3-megapixel cameras. Beyond this pixel count, the light availability has to be considerably greater to prevent excessive image degradation and increased bandwidth consumption.
By increasing ambient light levels and ensuring that the direction the light strikes potential intruders is aligned with the direction of video coverage will can, in many cases, guarantee high-quality video. If there is any concern that this natural lighting may be interrupted by; nighttime light levels, light pollution, or even energy use, supplemental light in the form of infrared lighting may be the solution.
However, care must be taken to ensure the chosen camera has a night mode function, allowing the infrared lighting (IR) to be of use -also the consideration of whether the cameras can handle the focus shift when the IR is used. As IR has a different wavelength and therefore can affect the depth of field causing the focus to shift.
*Chips can be defined as – camera sensors such as CCD or CMOS
CCD – close coupled device
CMOS - complementary metal-oxide-semiconductor
Performance has been discussed considering light conditions, now image...
Grabbing image detail out of a live / recorded video is limited by the fact that the detail must be in the image from the start. For example: If we have a video image that contains 640 x 480 pixels (VGA), and take ¼ of the image (320 x 240 pixels) and enlarge it (or blow it up) to be displayed on a 1920 x 1080 full-HD display, the video processor will produce many identical pixels for each single VGA pixel (effectively simply enlarging each pixel) creating a pixelated image. Image detail can also be defined/referred to as pixel density on a target (pixels per foot/pixels per meter).
In the illustration below, you will see that the 20 pixels/foot image is an example of enlarging an image and causing pixelization. To obtain the level of image detail needed to have usable video image, a camera and lens combination must be selected that will provide enough pixels for the requirement.
Image details alter at various levels, from 20 pixels per foot (66 pixels per meter) to 100 pixels per foot (328 pixels per meter). The image details displayed here should be retrievable from either a live or archived image.
Where we have shown two options and defined as 1 and 2, use the following;
These calculations are based on the scene having suitable lighting covering the scene. Image rate and compression are also factors. If there is an uncertainty, we suggest these are listed on the specification that you are providing. In these situations, increase everything by one step, for example, monitoring 1 moves up to 40 Pixels per foot. For Identification 1, this would increase to 125 pixels per inch and 410 pixels per Metre. We always suggest going for the highest if unsure use our suggestions and increase to the next as mentioned above. For Automatic facial pattern recognition, we suggest a resolution of 500 pixels to 750 pixels per metre. Please refer to the manufacturer's requirement for their minimum standards.
We hope this paper has instructed you in the more technical side of CCTV imagery and the science behind creating those perfect useable images.