Managing Multimedia and Unstructured Data in the Oracle Database
上QQ阅读APP看书,第一时间看更新

Definitions

All digital objects have a variety of formats and most are compressed or encrypted. This section defines some of the most common characteristics associated with a digital object.

Raw format

When a digital photo is taken, a video is recorded, and a document is scanned, the resulting data stored is referred to as the raw format. Some cameras immediately compress the raw format to save it on a storage. Most videos are immediately compressed, because the storage required—if the raw format was used—would exceed its storage limit. The raw format image is also referred to as the original image. The original should never be changed. If it is modified or transformed, then the resultant changed image should be saved as a derivative.

Compression

Is an algorithm used to encode digital information to reduce its storage size? With the introduction of high megapixel cameras, it's possible for a photo to be over 1 GB in size in its raw format. It's possible for a video to be over 100 GB uncompressed in size. For each media type, there are a large number of compression formats available. Each format aims to compress and saves maximum storage with minimal image quality loss.

Lossy data compression

Lossy compression is a term to indicate that information is lost on compression and decompression. Repeated compression and decompression may result in image degradation. Most algorithms employ data-loss algorithms that are typically found to be imperceptible to human senses. The most well-known lossy compression is the JPEG image compression used by most cameras and was the first popular compression format used in web browsers. The audio MP3 compression will remove data that is beyond the audio frequency range that most people can hear. JPEG and MPEG-4 both use lossy compression.

This compression sacrifices quality for storage. The formats enable a movie to be stored on a DVD. The compression also sacrifices quality for delivery speed. It enables sites, such as YouTube and Hulu to stream large amount of high-quality video over average speed network connections.

In the mid 1990s when Internet bandwidth was very low, lossy compression proved to be very popular for the delivery of images.

There are four main issues with this compression:

  • Long-term archival: Museums always focus on the future, and they are aware of the variable and changing nature of technology. Even though it might not be cost-effective to store digital objects in their raw format, keeping a small lossy equivalent and destroying the original one will be a long-term issue as information in the image will have been lost. The goal is to not lose information, as in the future, the technology will improve making it easier to store large numbers of digital objects. 5 years ago, a low-cost 3 TB disk was unheard, unlike now, as they are being used commonly. 10 years ago, computer monitors had a resolution that was 800 x 600, now they are three times that resolution enabling high-definition viewing. Back then, it took a lot of bandwidth to transfer a original image of 5 MB, and there was a delay in displaying it due to the limitations in the speed of a CPU. The focus was to reduce the size to make it quicker to deliver and faster to display. Now the same original image can be downloaded in seconds and displayed instantly. Today downloading a HD video can take an hour or longer and so has to be compressed to enable it to be downloaded and viewed quickly. In 10 years' time with anticipated improvements in broadband speed and CPU speed, mobile devices will be able to routinely download and play HD video in real time. This is why, it's important not to destroy the original and replace it with a compressed format.
  • Legal: In a court case, if a digital object is used as evidence, then it's important that object has not been modified or tampered with. Software is now available to test if an image has been edited using Adobe Photoshop (commonly referred to as being photoshopped)(1). Lossy compression effectively changes the image, and even though it looks like the original, it is not the original image. It cannot be trusted.
  • Medical: Small sections on a digital image can be crucial for a diagnosis when looking at digital X-ray. If information is lost on compression, then the doctors analyzing the image will not be able to trust what they see. Is that blur or slight shadow on the image a result of a tumor or due to information lost when the image was compressed? The information shown in a medical image has to be accurate, and there should be no lost information.
  • Compression and decompression times: It can take a lot of CPU time to compress a digital object. Most compression algorithms aim to have faster decompression times than compression times. In the case of a video, it's possible that a high-speed CPU is used for the compression, but the computer that plays it might have a low-speed CPU. Also, if a mobile device has a battery, then the less CPU involved in decompression, the less likelihood it will drain the battery, resulting in the mobile device being able to play the video. This is an important business directive, and the market has clearly shown that mobile devices that have a longer battery life and can play most video formats are more popular and easier to sell. The MPEG format can employ variable compression where an operator can choose which scenes have stronger compression compared to other scenes.

Lossless data compression

Is there a term to indicate that compressing and then decompressing results in no data loss? Compression algorithms that use lossless are typically not as efficient as a lossy one. Some TIF compression formats are lossless. The JPEG-2000 compression standards enable both lossy and lossless compression.

Lossless compression algorithms can be broken down into ones that are designed to look for repetitive patterns and ones that are designed to look for structures within a digital image. Additionally, some algorithms are designed to look for differences between images or frames (such as video), but generally these ones fall into the lossy category.

The traditional lossless compression algorithms used were initially designed for text. They could achieve very large compression on them, especially, if they contained a lot of blank space or similar-based character sets. When applied to icons with a small color range, they worked quite well and were adopted in them (GIF is a good example), but do not work well for digital photos or videos, as the characters are binary-based and generally appear random. When fractal geometry is applied, only then can patterns be seen. These initial compress algorithms include:

  • ZIP
  • Gzip
  • RAR
  • TAR

In most cases, when a digital image of JPEG format is zipped, it might become slightly bigger than the original. Zipping a set of JPEG images is only useful when compression is disabled, and all that is done is to group the files together into one larger file for easier distribution.

Codec

A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. Audio and video files contain streams of data. They are encoded and decoded using a codec. For video, there is an audio codec and a video codec, which are two separate data streams.

A codec can be lossless or lossy. A codec can also be used to decrypt an encrypted format.

Container

A container or wrapper format is a metafile format, whose specification describes how different data elements and metadata coexist in a computer file. Video formats, such as MPG (.mpg), Flash (.flv), and AVI (.avi) are containers, meaning that the compression formats they use can vary. It is possible for two files of MPG type to use completely different audio and video compression algorithms. TIF is also a container and uses a number of different compression algorithms.

The goal of a container is to simplify and hide the complexity of the codec from the user. An AVI video is a container, and it supports a large number of audio and video codecs within it. A TIF digital image is a container. It supports a variety of encoding algorithms within it. In both these cases, the user only has to deal with the the fact that the digital objects is a TIF or AVI. The Flash (.flv) is now a container, as it supports both the flash codec and the MPEG codec.

Most of these digital image and video formats were not designed to be containers. Most evolved this way to encompass new technology while providing backwards compatibility support for the older codecs. The file extension one sees for a digital image or video might not necessarily be indicative as to what codec was used to encode it.

The DCOM digital object format is not a container.