LZW Compression Algorithm Implementation In C

2025-02-07

The Lempel-Ziv-Welch (LZW) compression algorithm is a widely used lossless data compression method. Below is a practical implementation of the LZW algorithm in C, complete with verified code and commands.

Code Implementation

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_DICT_SIZE 4096
#define MAX_STR_LEN 1000

typedef struct {
char *sequence;
int code;
} DictEntry;

DictEntry dictionary[MAX_DICT_SIZE];
int dict_size = 256;

void initializeTable() {
for (int i = 0; i < dict_size; i++) {
dictionary[i].sequence = (char *)malloc(2);
sprintf(dictionary[i].sequence, "%c", i);
dictionary[i].code = i;
}
}

int findCode(char *str) {
for (int i = 0; i < dict_size; i++) {
if (strcmp(dictionary[i].sequence, str) == 0) {
return dictionary[i].code;
}
}
return -1;
}

void addEntry(char *str) {
if (dict_size < MAX_DICT_SIZE) {
dictionary[dict_size].sequence = (char *)malloc(strlen(str) + 1);
strcpy(dictionary[dict_size].sequence, str);
dictionary[dict_size].code = dict_size;
dict_size++;
}
}

void compress(char *input) {
char current[MAX_STR_LEN] = "";
char next;
int code;

for (int i = 0; i < strlen(input); i++) {
next = input[i];
char temp[MAX_STR_LEN];
strcpy(temp, current);
strncat(temp, &next, 1);

if (findCode(temp) != -1) {
strcpy(current, temp);
} else {
code = findCode(current);
printf("%d ", code);
addEntry(temp);
strcpy(current, &next);
}
}
code = findCode(current);
printf("%d\n", code);
}

void freeTable() {
for (int i = 0; i < dict_size; i++) {
free(dictionary[i].sequence);
}
}

int main() {
char input[] = "TOBEORNOTTOBEORTOBEORNOT";
initializeTable();
compress(input);
freeTable();
return 0;
}

Explanation of the Code

1. initializeTable: Initializes the dictionary with single-character strings.

findCode: Searches for a string in the dictionary and returns its code.
addEntry: Adds a new string to the dictionary.

4. compress: Implements the LZW compression algorithm.

freeTable: Frees the memory allocated for the dictionary.
main: The entry point of the program, which initializes the dictionary, compresses the input, and cleans up resources.

What Undercode Say

The LZW compression algorithm is a powerful tool for reducing the size of data without losing any information. This implementation in C demonstrates the core concepts of the algorithm, including dictionary initialization, dynamic expansion, and code representation. Here are some additional Linux commands and tools that can be useful when working with compression algorithms:

gzip: A command-line utility for file compression and decompression.
```
gzip filename
gunzip filename.gz
```

tar: Used to compress and archive files.

tar -czvf archive.tar.gz directory/
tar -xzvf archive.tar.gz

bzip2: Another compression tool that often provides better compression ratios than gzip.
```
bzip2 filename
bunzip2 filename.bz2
```
xxd: A utility to create a hex dump of a file, which can be useful for analyzing compressed data.
```
xxd filename
```
diff: Compare files to see the differences, useful when testing compression algorithms.
```
diff file1 file2
```

For further reading on LZW and other compression techniques, you can visit the following resources:
– LZW Compression on Wikipedia
– GNU gzip Manual
– bzip2 Documentation

Understanding and implementing compression algorithms like LZW is crucial for optimizing data storage and transmission. The provided C code is a practical example that can be extended and modified to suit specific needs. Experimenting with different inputs and dictionary sizes can provide deeper insights into the algorithm’s behavior and efficiency.

References:

Hackers Feeds, Undercode AI

Listen to this Post