gm2
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What is the equivalent type in GM2?


From: Benjamin Kowarsch
Subject: Re: What is the equivalent type in GM2?
Date: Sat, 28 Mar 2020 21:43:08 +0900


On Sat, 28 Mar 2020 at 17:10, Hưng Hưng <address@hidden> wrote:
The C version doesn't impose any max length.

Strictly speaking, C the language considers char* to be a pointer to a single character, thus a maximum length of one.

However, the C compiler doesn't care about type safety and allows you to read and write past the end of that character string of length one.

This is the single most important reason why the major operating systems today are so vulnerable to cyber attacks. The vast majority of security vulnerabilities are based on buffer overflow exploits in C.

By contrast, Modula-2 was specifically designed as a type safe language. For this reason, the compiler does not permit you to read and write past the declared capacity limit of a type.

This is not a bug, but a feature. And a very important feature that we want to keep.

So I have to guest and put a value for max length that I found suitable? It's like we impose an imagination/artificial limitation to our binding with no reason at all.

The use of static character arrays goes back to the 1960s. Nowadays we tend to use dynamic collection types where the capacity is determined during allocation at runtime. But this isn't built into the language. Instead, it has to be supplied in form of libraries.

You may want to consider writing yourself a dynamic string library and then use that.

Or perhaps you can find one that meets your requirements and use that.

The indeterminate record types Gaius and I were talking about in the other thread are specifically designed  to allow easy implementation of dynamic collection types but with type safety.

TYPE DynString = POINTER TO RECORD
  length : LONGCARD;
+ string : ARRAY OF CHAR
END;

then ...

VAR str : DynString;

NEW str CAPACITY 1000;

after which

CAPACITY(str) will return 1000

and LENGTH(str) will return 0.

Alternatively, with initialisation string ...

NEW str := "The quick brown fox jumps over the lazy dog.";

after which

CAPACITY(str) and LENGTH(str) will both return 34.

However, that's not available yet. So, in the meantime, you will have to either use dangerous pointer arithmetic in your dynamic type implementation, or if you want to keep type safety you will need to be creative.

I have implemented a dynamic string library for interned strings in one of my projects which is available at github.

PIM version
https://github.com/m2sf/m2pp/blob/master/src/String.pim.def
https://github.com/m2sf/m2pp/blob/master/src/imp/String.pim.mod

ISO version
https://github.com/m2sf/m2pp/blob/master/src/String.iso.def
https://github.com/m2sf/m2pp/blob/master/src/imp/String.iso.mod

This uses a Passepartout, which is French for a key that matches multiple locks.

TYPE Passepartout = POINTER TO StrBlank.Largest;

TYPE StringDescriptor = RECORD
  length : CARDINAL;
  intern : Passepartout
END;

where StrBlank.Largest is defined in

https://github.com/m2sf/m2pp/blob/master/src/StrBlank.def
https://github.com/m2sf/m2pp/blob/master/src/imp/StrBlank.mod

which contains a number of length specific character array types.

Type Largest is the largest character array type available.

When a new dynamic string is allocated, the library determines the character array that is the closest match for capacity and allocates a new dynamic string of that type, which is then linked to the intern field using a CAST since the formal type of field intern is of the largest character array type. However the benefit is that we can still use array subscript notation to address individual characters in the string instead of having to use pointer arithmetic. The casting between type Largest and the actually allocated character array type is confined to this one library and happens only in two or three places. Outside the library the strings are only accessible via the library's API.

This is a reasonable compromise between readability, convenience and type safety. Besides, using pointer arithmetic would be less readable, less convenient and less safe. So it is the best you can do with classical Modula-2 at this time.







Vào Th 7, 28 thg 3, 2020 vào lúc 14:47 Benjamin Kowarsch <address@hidden> đã viết:
A pointer to char in C is not equivalent to a pointer to CHAR in Modula-2.

In C a string may be either a char array or a pointer to a single char where the lack of type safety is then EXPLOITED to ignore the fact that the pointer type points to a single char, not a character string, and with DEVASTATING CONSEQUENCES !!!

By contrast, in Modula-2 a string is a character array with a maximum capacity associated to the type and type safety is enforced, thus a pointer to a singe character is always interpreted correctly as having a payload of only one single character.

Thus, the closest equivalent of

char* str;

in Modula-2 would be

POINTER TO ARRAY [0..MaxStrLen] OF CHAR;

where MaxStrLen must be a compile time constant, that is, it cannot be changed dynamically at runtime.

And if you have a static character array string in Modula-2, like

VAR str : ARRAY [0..80] OF CHAR;

then you can't just pass str to a char* parameter of a C function. Instead you need to pass a pointer to it.

TYPE Str80 = ARRAY [0..80] OF CHAR;
VAR str : Str80;

TYPE Str80Ptr = POINTER TO Str80;
VAR strPtr : Str80Ptr;

then

str := "the quick brown fox jumps over the lazy dog.";
strPtr := VAL(Str80Ptr, ADR(str));

then

passToC(strPtr);

assuming

void passToC(const char* s);

Although GM2 may already map an argument of a character array type to char* when using the DEFINITION MODULE FOR "C" syntax to map C functions. Even if it does, it likely won't do the same for char** and char***.

Thus, if the C function parameters are char** then you need

POINTER TO POINTER TO ARRAY [0..MaxStrLen] OF CHAR;

Likewise for char*** you need

POINTER TO POINTER TO POINTER TO ARRAY [0..MaxStrLen] OF CHAR;

As I have mentioned before, the best way to interface to C APIs is to use a layered approach where the lowest level interfaces directly with the C API and a user level provides a wrapped Modula-2 representation that is independent of the C API. In the lower level library you can then convert and cast types as needed to pass between C and Modula-2.





On Sat, 28 Mar 2020 at 03:27, Hưng Hưng <address@hidden> wrote:
Let me add additional information. If I use the pointer trick, e.g: PChar, PPChar, PPPChar, then I can't pass the M2 string into C function as it requires C string, if I try to do so the compiler will complain because it expect char to have only length 1. M2 and C have a very different way of processing string, as I see the equivalent pointer to char trick in C would not work on M2.

There is a procedure in module DynamicStrings allow to convert between M2 string and C string, but again, how to translate these data type correctly? If we go the pointer trick we will then have to figure out how to represent PPChar, PPPChar as the procedure in DynamicStrings only helps us up to here. It's circular reasoning. I feel my head as going to explode.

Vào Th 7, 28 thg 3, 2020 vào lúc 01:15 Hưng Hưng <address@hidden> đã viết:
The C function return or took a C string as parameter, with is an array of char or pointer to unsigned char.

Another function return or took an array of C string as parameter, which is an 2D array of char or pointer to pointer to unsigned char.

Another function return or took an array of array of C string as parameter, which is an 3D array of
char or pointer to pointer to pointer to unsigned char.

It's too complex. C code tends to abuse pointer too much.

e.g:

void      IupResetAttribute(Ihandle* ih, const char* name);

int       IupGetAllAttributes(Ihandle* ih, char** names, int n);

int       IupOpen          (int *argc, char ***argv);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]