dmlfw (Daniyal Machine Learning Framework)
dmlfw_data_encoder.h File Reference

Categorical and binary feature encoding utilities. More...

#include <dmlfw_vector.h>
Include dependency graph for dmlfw_data_encoder.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Functions

void dmlfw_encoder_encode (char *source, char *target, dmlfw_row_vec_string *columns_to_encode, char *algorithm)
 Generic encoder API. Dispatches by algorithm string.
 
void dmlfw_encoder_encode_binary (char *source, char *target, dmlfw_row_vec_string *columns_to_encode)
 Encodes specified columns in a CSV file using binary encoding.
 
void dmlfw_encoder_encode_one_hot (char *source, char *target, dmlfw_row_vec_string *columns_to_encode)
 Encodes specified columns in a CSV file using one-hot encoding.
 

Detailed Description

Categorical and binary feature encoding utilities.

Version
1.0
Date
2025-09-25

This module provides functions for encoding categorical/textual data columns from CSV files into numerical formats suitable for ML models. Supported encoding schemes include one-hot and binary encoding, with a generic dispatcher for algorithm selection.

Error Handling:

All functions report errors via the centralized error API. Use dmlfw_error() after function calls to check for errors. Retrieve error and diagnostic details with dmlfw_get_error_string() and dmlfw_get_debug_string().

Ownership:

All allocated resources are cleaned up internally on error and after completion. No ownership is transferred to caller.

Function Documentation

◆ dmlfw_encoder_encode()

void dmlfw_encoder_encode ( char *  source,
char *  target,
dmlfw_row_vec_string columns_to_encode,
char *  algorithm 
)

Generic encoder API. Dispatches by algorithm string.

Parameters
source[in] Input CSV file path (must not be NULL).
target[in] Output CSV file path (must not be NULL).
columns_to_encode[in] Vector of column names to encode (must not be NULL).
algorithm[in] Algorithm name ("one-hot" or "binary", case-insensitive).

Usage example:

char err[512], dbg[512];
dmlfw_row_vec_string_set(cols, 0, "feature");
dmlfw_encoder_encode("data.csv", "encoded.csv", cols, "binary");
if (dmlfw_error()) {
dmlfw_get_error_string(err, sizeof(err));
dmlfw_get_debug_string(dbg, sizeof(dbg));
printf("Error in encoding: %s\nDebug info: %s\n", err, dbg);
return EXIT_FAILURE;
}
void dmlfw_encoder_encode(char *source, char *target, dmlfw_row_vec_string *columns_to_encode, char *algorithm)
Generic encoder API. Dispatches by algorithm string.
void dmlfw_get_error_string(char *error_string, uint32_t size)
Copies the last error message into the provided character buffer.
uint8_t dmlfw_error(void)
Checks if the last framework call resulted in an error.
void dmlfw_get_debug_string(char *debug_string, uint32_t size)
Copies detailed debug information about the last error into the provided character buffer.
void dmlfw_row_vec_string_destroy(dmlfw_row_vec_string *vector)
Destroys a row vector of strings and frees its memory.
dmlfw_row_vec_string * dmlfw_row_vec_string_create_new(dimension_t columns)
Creates a new row vector of strings of specified length.
void dmlfw_row_vec_string_set(dmlfw_row_vec_string *vector, index_t index, char *string)
Sets a string element in the row vector.
struct __dmlfw_row_vec_string dmlfw_row_vec_string
Opaque structure representing a row vector of strings.
Definition dmlfw_vec_string.h:82

◆ dmlfw_encoder_encode_binary()

void dmlfw_encoder_encode_binary ( char *  source,
char *  target,
dmlfw_row_vec_string columns_to_encode 
)

Encodes specified columns in a CSV file using binary encoding.

Parameters
source[in] Input CSV file path (must not be NULL).
target[in] Output CSV file path (must not be NULL).
columns_to_encode[in] Vector of column names to encode (must not be NULL).

Usage example:

char err[512], dbg[512];
dmlfw_row_vec_string_set(columns, 0, "product");
dmlfw_encoder_encode_binary("data.csv", "binary.csv", columns);
if (dmlfw_error()) {
dmlfw_get_error_string(err, sizeof(err));
dmlfw_get_debug_string(dbg, sizeof(dbg));
printf("Error in binary encoding: %s\nDebug info: %s\n", err, dbg);
return EXIT_FAILURE;
}
void dmlfw_encoder_encode_binary(char *source, char *target, dmlfw_row_vec_string *columns_to_encode)
Encodes specified columns in a CSV file using binary encoding.

◆ dmlfw_encoder_encode_one_hot()

void dmlfw_encoder_encode_one_hot ( char *  source,
char *  target,
dmlfw_row_vec_string columns_to_encode 
)

Encodes specified columns in a CSV file using one-hot encoding.

Parameters
source[in] Input CSV file path (must not be NULL).
target[in] Output CSV file path (must not be NULL).
columns_to_encode[in] Vector of column names to encode (must not be NULL).

Usage example:

char err[512], dbg[512];
dmlfw_row_vec_string_set(columns, 0, "size");
dmlfw_row_vec_string_set(columns, 1, "shape");
dmlfw_encoder_encode_one_hot("data.csv", "ohe.csv", columns);
if (dmlfw_error()) {
dmlfw_get_error_string(err, sizeof(err));
dmlfw_get_debug_string(dbg, sizeof(dbg));
printf("Error in one-hot encoding: %s\nDebug info: %s\n", err, dbg);
return EXIT_FAILURE;
}
void dmlfw_encoder_encode_one_hot(char *source, char *target, dmlfw_row_vec_string *columns_to_encode)
Encodes specified columns in a CSV file using one-hot encoding.
Examples
example_dmlfw_data_encoder.c.