A data repository model may be used to provide structure to the storage. Before data are sent to their storage locations, they are selected, extracted, and manipulated. Many different techniques have been developed to optimize the backup procedure. These include optimizations for dealing with open files and live data sources as well as compression, encryption, and de-duplication, among others.

Any backup strategy starts with a concept of a data repository. The backup data needs to be stored, and probably should be organized to a degree. A more sophisticated setup could include a computerized index, catalog, or relational database. Unstructured  An unstructured repository may simply be a stack of tapes or CD-Rs or DVD-Rs with minimal information about what was backed up and when. This is the easiest to implement, but probably the least likely to achieve a high level of recoverability as it lacks automation.

Regardless of the repository model that is used, the data has to be copied onto some data storage medium. Magnetic tape  Magnetic tape has long been the most commonly used medium for bulk data storage, backup, archiving, and interchange. Tape has typically had an order of magnitude better capacity-to-price ratio when compared to hard disk, but the ratios for tape and hard disk have become closer. Regardless of the data repository model, or data storage media used for backups, a balance needs to be struck between accessibility, security and cost. These media management methods are not mutually exclusive and are frequently combined to meet the user’s needs. Using on-line disks for staging data before it is sent to a near-line tape library is a common example. On-line  On-line backup storage is typically the most accessible type of data storage, which can begin a restore in milliseconds.

A successful backup job starts with selecting and extracting coherent units of data. Most data on modern computer systems is stored in discrete units, known as files. These files are organized into filesystems. Files that are actively being updated can be thought of as “live” and present a challenge to back up.