Sunday, May 5, 2013

Book Review: Managing Multimedia and Unstructured Data in the Oracle Database

There is a large amount of unstructured data in the real world. In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80-90% of all potentially usable business information may originate in unstructured form[1].

There is no doubt that we need to process this information to extract meaning and create structured data about the information.  One approach will be using Oracle Database to manage such data which includes multimedia (aka "rich media").  This book:
aims to help readers understand and manage unstructured data using Oracle Database.


Dealing with Unstructured Data


Unstructured Data refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables[1].

Relational data can be considered a subset of structured data.  Besides relational databases, structured data can also be stored in non-relational format such as in XML, inverted list databases[2], or object databases[4]. XML does not conform with formal structure of data models, but nonetheless it contains tags to separate  semantic elements and enforce hierarchies of records and fields within the data.  Therefore, one can consider XML as semi-structured.

It's possible to store unstructured data in a column of a relational table, which is structured.  The traditional approach has been to just treat it as a blob (binary large object), but with a greater understanding of the variety of unstructured data types (i.e., video, audio, photographs, documents, etc.) that exist, the need to manage them has grown.

Metadata


To manage unstructured data, metadata is crucial. It is the data that describes the unstructured data and gives meaning to it.   Metadata can be used for
  • Searching (covered in Chapter 4, Searching the Multimedia Warehouse)
  • Annotation
    • Adding meaning to unstructured data objects
  • Relating unstructured data objects (or adding structure)
  • Matching data stored in relational databases
It is envisaged that in the future technology will improve to the point that algorithms will be able to identify objects and people in a video or photo, and understand sounds and complex speech in audio files. When that point is reached, the need for metadata may be reduced or limited to a smaller scope.

Oracle Database


In the past few years, with changes in database technology and improvements in disk performance and storage, it now makes business sense to use the Oracle database to store and manage all of an organization's unstructured data.

For a database management system to begin to correctly handle the unstructured data, it must have support for objects[4].  The use of a database that can support objects makes it a lot easier to manage large volumes of digital objects. Though these objects can be stored in a file system, there are now advantages to having them stored inside the database.  In Oracle, both relational data and objects are supported.  After adding Online Analytical Processing (OLAP)[5] and XML[6], Oracle database grew from being relational to one supporting most structures.

Oracle multimedia uses blobs and new types, which can be accessed and used as required. In addition, it supports a variety of methods that simplify the act of loading and manipulating digital objects. This is covered in in Chapter 7, Techniques for Creating a Multimedia Database.

Most databases can enable unstructured data to be stored in them, but do not support the management, control, and manipulation of that data. Even though Oracle is a market leader in unstructured data management there are still a large number of major improvements needed.  This is covered in Chapter 9, Understanding the Limitations of Oracle Products.

Scalability


When working with multimedia and unstructured data, a row in the database can be 10 GB in size, which could be greater than an entire relational database.  Therefore, traditional tuning techniques might fail as the rules regarding them no longer make sense.

For example, in a multimedia warehouse (covered in Chapter 4), the concept of trying to achieve logical data consistency is not attempted, as it becomes apparent that the amount of data that is fuzzy forms the bulk of most of the digital objects.  So, novel solutions to tuning problems are needed.

In Oracle Multimedia, there are also built-in supports for scalability.  For example, the new 11G Securefile BLOBS using parallel techniques, that allows of loading of files much faster than using traditional BLOBS.

From the hardware front, technology also offers help.  With the recent introduction of a low-cost terabyte SATA disks, and with the use of low-cost SAN's, the ability to store a petabyte is within the reach of a number of organizations.

To handle large amount of unstructured data, issues seen and solutions provided by Oracle include:

  • Hitting limits on the image size
    • An Oracle BLOB can be unlimited in size
  • Reaching internal structural limits within the database (max number of files that store data)
    • Oracle's use of tablespaces allow a large number of multimedia files to be stored in it
    • With the ability to control where a blob is stored, files can be split across multiple tablespaces and devices 
  • Dealing with fragmentation 
    • By using locally managed tablespaces, fragmentation is removed as a performance issue. 
  • The efficient management of those images (for example, backup/recovery)
    • Using partitioning on LOBS allows a very large number of multimedia files to be stored and efficiently managed
    • Using RMAN, Oracle can be configured to back up large amounts of data
When dealing with Multimedia, one has to look at the different dimensions of scalability to best understand how the Oracle Database best handles it.  This includes managing the CPU, memory, disk I/O, and network bandwidth. All of the above are covered in Chap 8, Tuning.

References

  1. Unstructured data (Wikipedia)
  2. ADABAS (Wikipedia)
  3. 3D Printing (Wikipedia)
  4. Object Database (Wikipedia)
  5. Oracle OLAP
  6. Oracle XML DB
  7. Managing Multimedia and Unstructured Data in the Oracle Database (Reviewed Book)

No comments: