0% found this document useful (0 votes)

26 views4 pages

Data Compression MM Seminar 1

Data compression reduces the number of bits needed to represent data, saving storage space and improving file transfer speeds. It can be lossless or lossy, with lossless retaining all original data and lossy eliminating unnecessary bits. Compression is crucial for efficient data management, especially in backup systems, but can impact system performance due to CPU and memory resource usage.

Uploaded by

cse students df

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views4 pages

Data Compression MM Seminar 1

Uploaded by

cse students df

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Data compression

What is data compression?

Data compression is a reduction in the number of bits needed to represent data. Compressing
data can save storage capacity, speed up file transfer and decrease costs for storage hardware
and network bandwidth.

How compression works

Compression is performed by a program that uses a formula or algorithm to determine how to
shrink the size of the data. For instance, an algorithm may represent a string of bits -- or 0s
and 1s -- with a smaller string of 0s and 1s by using a dictionary for the conversion between
them. The formula may also insert a reference or pointer to a string of 0s and 1s that the
program has already seen.

Text compression can be as simple as removing all unneeded characters, inserting a single
repeat character to indicate a string of repeated characters and substituting a smaller bit string
for a frequently occurring bit string. Data compression can reduce a text file to 50% or a
significantly higher percentage of its original size.

For data transmission, compression can be performed on the data content or on the entire
transmission unit, including header data. When information is sent or received via the
internet, larger files -- either singly or with others as part of an archive file -- may be
transmitted in a ZIP, GZIP or other compressed format.

Why is data compression important?

Data compression can dramatically decrease the amount of storage a file takes up. For
example, in a 2:1 compression ratio, a 20 megabyte (MB) file takes up 10 MB of space. As a
result of compression, administrators spend less money and less time on storage.

Compression optimizes backup storage performance and has recently shown up in primary
storage data reduction. Compression will be an important method of data reduction as data
continues to grow exponentially.

Virtually any type of file can be compressed, but it's important to follow best practices when
choosing which ones to compress. For example, some files may already come compressed, so
compressing those files would not have a significant impact.

Data compression methods: lossless and lossy compression

Compressing data can be a lossless or lossy process. Lossless compression enables the
restoration of a file to its original state, without the loss of a single bit of data, when the file is
uncompressed. Lossless compression is the typical approach with executables, as well as text
and spreadsheet files, where the loss of words or numbers would change the information.

Lossy compression permanently eliminates bits of data that are redundant, unimportant or
imperceptible. Lossy compression is useful with graphics, audio, video and images, where the
removal of some data bits has little or no discernible effect on the representation of the
content.

Graphics image compression can be lossy or lossless. Graphic image file formats are
typically designed to compress information since the files tend to be large. JPEG is an image
file format that supports lossy image compression. Formats such as GIF and PNG use lossless
compression.

Compression vs. data deduplication

Compression is often compared to data deduplication, but the two techniques operate
differently. Deduplication is a type of compression that looks for redundant chunks of data
across a storage or file system and then replaces each duplicate chunk with a pointer to the
original. Data compression algorithms reduce the size of the bit strings in a data stream that is
far smaller in scope and generally remembers no more than the last megabyte or less of data.

File-level deduplication eliminates redundant files and replaces them with stubs pointing to
the original file. Block-level deduplication identifies duplicate data at the subfile level. The
system saves unique instances of each block, uses a hash algorithm to process them and
generates a unique identifier to store them in an index. Deduplication typically looks for
larger chunks of duplicate data than compression, and systems can deduplicate using a fixed
or variable-sized chunk.
Deduplication is most effective in environments that have a high degree of redundant data,
such as virtual desktop infrastructure or storage backup systems. Data compression tends to
be more effective than deduplication in reducing the size of unique information, such as
images, audio, videos, databases and executable files. Many storage systems support both
compression and deduplication.

Data compression and backup

Compression is often used for data that's not accessed much, as the process can be intensive
and slow down systems. Administrators, though, can seamlessly integrate compression in
their backup systems.

Backup is a redundant type of workload, as the process captures the same files frequently. An
organization that performs full backups will often have close to the same data from backup to
backup.

There are major benefits to compressing data prior to backup:

 Data takes up less space, as a compression ratio can reach 100:1, but between 2:1 and 5:1
is common.
 If compression is done in a server prior to transmission, the time needed to transmit the
data and the total network bandwidth are drastically reduced.
 On tape, the compressed, smaller file system image can be scanned faster to reach a
particular file, reducing restore latency.
 Compression is supported by backup software and tape libraries, so there is a choice of
data compression techniques.
Pros and cons of compression
The main advantages of compression are a reduction in storage hardware, data transmission
time and communication bandwidth -- and the resulting cost savings. A compressed file
requires less storage capacity than an uncompressed file, and the use of compression can lead
to a significant decrease in expenses for disk and/or solid-state drives. A compressed file also
requires less time for transfer, and it consumes less network bandwidth than an uncompressed
file.

The main disadvantage is the performance impact from the use of CPU and memory
resources to compress the data. Many vendors have designed their systems to try to minimize
the impact of the processor-intensive calculations associated with compression. If the
compression runs inline, before the data is written to disk, the system may offload
compression to preserve system resources. For instance, IBM uses a separate hardware
acceleration card to handle compression with some of its enterprise storage systems.
If data is compressed after it is written to disk, or post-process, the compression may run in
the background to reduce the performance impact. Although post-process compression can
reduce the response time for each I/O, it still consumes memory and processor cycles and can
affect the overall number of I/Os a storage system can handle. In addition, data initially must
be written to disk or flash drives in an uncompressed form, so the physical storage savings
are not as great as they are with inline compression.

File system compression

File system compression takes a fairly straightforward approach to reducing the storage
footprint of data by transparently compressing each file as it is written.

Many of the popular Linux file systems -- including Reiser4, ZFS and btrfs -- and Microsoft
NTFS have a compression option. The server compresses chunks of data in a file and then
writes the smaller fragments to storage.

Read-back involves a relatively small latency to expand each fragment, while writing adds
substantial load to the server, so compression is usually not recommended for data that is
volatile. File system compression can weaken performance, so users should deploy it
selectively on files that are not accessed frequently.

Historically, with the expensive hard drives of early computers, data compression software --
such as DiskDoubler and SuperStor Pro -- was popular and helped establish mainstream file
system compression.

Storage administrators can also apply the technique of using compression and deduplication
for improved data reduction.

Data differencing

Data differencing is a general term for comparing the contents of two data objects. In the
context of compression, it involves repetitively searching through the target file to find
similar blocks and replacing them with a reference to a library object. This process repeats
until it finds no additional duplicate objects. Data differencing can result in many compressed
files with just one element in the library representing each duplicated object.

In virtual desktops, this technique can feature a compression ratio of as much as 100:1. The
process is often more closely aligned with deduplication, which looks for identical files or
objects, rather than within the content of each object.

Data differencing is sometimes referred to as deduplication.

Multimedia Data Compression Techniques
No ratings yet
Multimedia Data Compression Techniques
7 pages
Data Compression in Multimedia Text Imag
No ratings yet
Data Compression in Multimedia Text Imag
7 pages
Supplementary Notes On Compression and Formats
No ratings yet
Supplementary Notes On Compression and Formats
15 pages
Algorithm of Loseless Data Compression
No ratings yet
Algorithm of Loseless Data Compression
28 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Data Compression
No ratings yet
Data Compression
4 pages
Data Compression Techniques Explained
No ratings yet
Data Compression Techniques Explained
18 pages
Data Compression
No ratings yet
Data Compression
23 pages
Research
No ratings yet
Research
4 pages
Image Compression Techniques Explained
No ratings yet
Image Compression Techniques Explained
33 pages
Measuring Data Storage
No ratings yet
Measuring Data Storage
5 pages
Image Compression System Using H.264 Encoding
No ratings yet
Image Compression System Using H.264 Encoding
6 pages
Data Compression Techniques Guide
No ratings yet
Data Compression Techniques Guide
22 pages
Data Storage and File Compression
No ratings yet
Data Storage and File Compression
15 pages
Course Name Multimedia and Human Computer Interaction Course Code: ITEC - M3021 Individual Assignment
No ratings yet
Course Name Multimedia and Human Computer Interaction Course Code: ITEC - M3021 Individual Assignment
11 pages
Multimedia Unit-4
No ratings yet
Multimedia Unit-4
30 pages
1.3 Data Storage and File Compression
No ratings yet
1.3 Data Storage and File Compression
12 pages
Utility
No ratings yet
Utility
3 pages
Data Compression Techniques Guide
No ratings yet
Data Compression Techniques Guide
6 pages
Lossless and Lossy Compression
No ratings yet
Lossless and Lossy Compression
18 pages
Understanding File Compression
No ratings yet
Understanding File Compression
5 pages
Data Compression Assignment
No ratings yet
Data Compression Assignment
9 pages
Image Compression Essentials
No ratings yet
Image Compression Essentials
5 pages
The Importance and Applications of Data Compression
No ratings yet
The Importance and Applications of Data Compression
4 pages
Multimedia Class 11
No ratings yet
Multimedia Class 11
6 pages
Section1 Data Compression
No ratings yet
Section1 Data Compression
14 pages
22bce7256 Assignment 2
No ratings yet
22bce7256 Assignment 2
7 pages
Data Compression
No ratings yet
Data Compression
4 pages
Overview of Data Compression Techniques
No ratings yet
Overview of Data Compression Techniques
20 pages
Dynamic Data Compression Techniques
No ratings yet
Dynamic Data Compression Techniques
2 pages
Data Compression UNIT1
No ratings yet
Data Compression UNIT1
74 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Module 5 IVP
No ratings yet
Module 5 IVP
112 pages
Efficiency of Lossless Data Compression: MIPRO 2011, May 23-27, 2011, Opatija, Croatia
No ratings yet
Efficiency of Lossless Data Compression: MIPRO 2011, May 23-27, 2011, Opatija, Croatia
6 pages
Unit 5
No ratings yet
Unit 5
41 pages
Lossy Compression
No ratings yet
Lossy Compression
7 pages
Digital Image Compression Guidelines
No ratings yet
Digital Image Compression Guidelines
9 pages
Data Compression: Jorge Solorzano
No ratings yet
Data Compression: Jorge Solorzano
11 pages
Data Compression Report
No ratings yet
Data Compression Report
12 pages
1.3 Compression
No ratings yet
1.3 Compression
7 pages
Image Compression Techniques Explained
No ratings yet
Image Compression Techniques Explained
82 pages
Lossless Video Compression For Archives: Motion JPEG2k and Other Options
No ratings yet
Lossless Video Compression For Archives: Motion JPEG2k and Other Options
8 pages
Compression Techniques in Multimedia
No ratings yet
Compression Techniques in Multimedia
19 pages
Lossless Compression
No ratings yet
Lossless Compression
11 pages
MM Compression
No ratings yet
MM Compression
38 pages
A Novel Approach of Lossless Image Compression Using Two Techniques
No ratings yet
A Novel Approach of Lossless Image Compression Using Two Techniques
5 pages
Image Compression Using RLE and LZW
100% (1)
Image Compression Using RLE and LZW
5 pages
IMAGE COMPRESSION AND DECOMPRESSION SYSTEM USING RUN LENGTH ENCODING ALGORITHM (Chapter 1-3)
No ratings yet
IMAGE COMPRESSION AND DECOMPRESSION SYSTEM USING RUN LENGTH ENCODING ALGORITHM (Chapter 1-3)
18 pages
Last - Day - P1 - Quick Revision 2025
No ratings yet
Last - Day - P1 - Quick Revision 2025
20 pages
Lossy vs. Lossless
No ratings yet
Lossy vs. Lossless
11 pages
CP4P Compression and Backup
No ratings yet
CP4P Compression and Backup
36 pages
عريشة امتياز
No ratings yet
عريشة امتياز
9 pages
IMAGE Comp DWT &DCT Ext
No ratings yet
IMAGE Comp DWT &DCT Ext
20 pages
Understanding Data Compression Types
No ratings yet
Understanding Data Compression Types
69 pages
Lec - 7 Ece595
No ratings yet
Lec - 7 Ece595
40 pages
ASP
No ratings yet
ASP
5 pages
103 ExamFeeApr2025
No ratings yet
103 ExamFeeApr2025
6 pages
1
No ratings yet
1
1 page
Document
No ratings yet
Document
1 page
Overview of VPN Technology and Protocols
100% (2)
Overview of VPN Technology and Protocols
4 pages
ICT2613MJ2018
No ratings yet
ICT2613MJ2018
11 pages
Relational Algebra & Joins Guide
No ratings yet
Relational Algebra & Joins Guide
10 pages
Extranet Overview - Tutorialspoint
No ratings yet
Extranet Overview - Tutorialspoint
4 pages
MAH MCA CET Sample Question Paper
0% (2)
MAH MCA CET Sample Question Paper
2 pages
Paging and Segmentation
No ratings yet
Paging and Segmentation
13 pages
Chapter 3 Relational Algebra
No ratings yet
Chapter 3 Relational Algebra
7 pages
Computer Components Worksheet 1A Answers
No ratings yet
Computer Components Worksheet 1A Answers
2 pages
MELSEC WS Safety Controller Guide
No ratings yet
MELSEC WS Safety Controller Guide
30 pages
Comp f2 Dec Holiday Assignments
No ratings yet
Comp f2 Dec Holiday Assignments
9 pages
Database Concepts and Interview Prep
No ratings yet
Database Concepts and Interview Prep
13 pages
Lecture - Developing Storage Solutions With Amazon S3
No ratings yet
Lecture - Developing Storage Solutions With Amazon S3
37 pages
ARM Assembly Language Programming Examples
100% (2)
ARM Assembly Language Programming Examples
12 pages
Ozone User v0
No ratings yet
Ozone User v0
43 pages
C Programming Lab Subject Code: 20UCA1CC2P/20UIT1CC2P Class: BCA/BSC IT Programs Inside
100% (1)
C Programming Lab Subject Code: 20UCA1CC2P/20UIT1CC2P Class: BCA/BSC IT Programs Inside
12 pages
Epicor ERP 10 Data Model
100% (4)
Epicor ERP 10 Data Model
10 pages
Database Assignment Introduction of Database
100% (1)
Database Assignment Introduction of Database
4 pages
phpMyAdmin User Guide: Login & Setup
No ratings yet
phpMyAdmin User Guide: Login & Setup
11 pages
Eeprom Emulation st10
100% (1)
Eeprom Emulation st10
15 pages
1 - Chapter 1 - Introducing Active Directory
100% (1)
1 - Chapter 1 - Introducing Active Directory
40 pages
Oracle Data Integrator 11g: Sunopsis in 2006
No ratings yet
Oracle Data Integrator 11g: Sunopsis in 2006
13 pages
En - MB1246 H743 E03 - Schematic
No ratings yet
En - MB1246 H743 E03 - Schematic
18 pages
Kishan Thesiya: Tech Skills & Projects
No ratings yet
Kishan Thesiya: Tech Skills & Projects
1 page
Big Data MapReduce Training Guide
No ratings yet
Big Data MapReduce Training Guide
3 pages
Mapanet: Product Specifications
No ratings yet
Mapanet: Product Specifications
26 pages
Medical Prescription Database Assignment
No ratings yet
Medical Prescription Database Assignment
1 page
New Python
No ratings yet
New Python
41 pages
Binary Data Document Analysis
No ratings yet
Binary Data Document Analysis
15 pages
Oktober 2023 Biaya CV. PTS
No ratings yet
Oktober 2023 Biaya CV. PTS
62 pages
CompositeProvider - Design Considerations
No ratings yet
CompositeProvider - Design Considerations
12 pages

Data Compression MM Seminar 1

Uploaded by

Data Compression MM Seminar 1

Uploaded by

Data compression

What is data compression?

How compression works

Why is data compression important?

Data compression methods: lossless and lossy compression

Compression vs. data deduplication

Data compression and backup

There are major benefits to compressing data prior to backup:

File system compression

Data differencing is sometimes referred to as deduplication.

You might also like