StormC Runtime: Stringlib

A quick display

Here is a very clear example of what the library does


#include <storm/storm.h>

int main(void)
{
	scrt_init_strings();
	SString a = scrt_string("this string --substring here-- is special");
	SString b = scrt_string("--substring here-");
	SString c = scrt_string("this string --substring here-- is special");
	printf("a starts at: %d\n", a.str);
	printf("b starts at: %d\n", b.str);
	printf("c starts at: %d\n", c.str);
}

output:
a starts at: 0
b starts at: 12
c starts at: 0

As you can see, a and c are deduplicated. They are both the same string, not a new allocation.
Additionally, not only entire strings get deduped, but also substrings. The string b starts from within a

Adding New Strings

Adding / dedupping these strings is done fast by this function, using the immintrinn.h header, for SIMD instructions.


static Eq_Payload is_eq_at(const char * restrict needle, u32 needle_len)
{
static Eq_Payload is_eq_at(const char * restrict needle, u32 needle_len)
	{
	   u64 payload = 0 | NOT_MATCH;
	   u64 cand_loop_idx, start;

	   const u8 * restrict pool = scrt_strings->pool;
	   u64 cand_ct = 0;
	   u64 begin = 0;
	   u64 index_found_at = 0;

	   u64 off = scrt_strings->off;

	   if (needle_len == 0) return (u64)(0 | NOT_MATCH);

	   size_t pool_current_size_padded = sizeof(u64) * ((off + 31) & ~31);
	   u64 *candidates;
	   int result_alloc = posix_memalign((void **)&candidates, 32, pool_current_size_padded);
	   if(result_alloc != 0){
		perror("posix_memalign failed");
		exit(1);
	   }

	   const simd_32_u8 first_needle_char = _mm256_set1_epi8(needle[0]);
	   while (begin + 32 < off){
		simd_32_u8 chunk = _mm256_loadu_si256((simd_32_u8 *)(pool + begin));
		simd_32_u8 cmp = _mm256_cmpeq_epi8(chunk, first_needle_char);
		u32 mask = _mm256_movemask_epi8(cmp);

		while(mask){
			int bit = __builtin_ctz(mask);
			candidates[cand_ct++] = begin + bit;
			mask &= (mask - 1);
		}

		begin += 32;
	   }

	   for(cand_loop_idx = 0; cand_loop_idx < cand_ct; cand_loop_idx++){
		u64 start = 0;
		bool is_match = true;
		index_found_at = candidates[cand_loop_idx];
		while (start + 32 < needle_len){
			simd_32_u8 pool_segment = _mm256_loadu_si256((simd_32_u8 *)(pool + index_found_at + start));
			simd_32_u8 needle_segment = _mm256_loadu_si256((simd_32_u8 *)(needle + start));
			simd_32_u8 cmp = _mm256_cmpeq_epi8(pool_segment, needle_segment);
			u32 mask = _mm256_movemask_epi8(cmp);
			if (mask != -1){
				is_match = false;
				break;
			}

			start += 32;
		}

		while (start < needle_len){
			if(pool[index_found_at + start] != needle[start]){
				is_match = false;
				break;
			}
			start++;
		}

		if (is_match){
			payload = ((index_found_at << 8) | MATCH);
			goto defer;
		}

	   }

	defer:
	   free(candidates);
	   return payload;
	}
}

The first pass is to find candidate indexes, which is "at what indexes do I find this starting char?"
For our string b, which starts with "-", it will find all indexes where "-" exists.
It is then loading 32 chars at a time from both the haystack and the current string ( pads if too small ) and compares them
from that starting points of the candidate index. If everything matches, then the string exists, either as a standalone string or part of a string (aka a substring).

Length Comparisons

This has the massive advantage of very quick comparisons between strings. It is just a two-comparison operation, as such:


static inline bool scrt_strcmp(SString target, SString source)
{
	return target.str == source.str && target.len == source.len;
}

If it starts at the same position and has the same length, then it is a match. There is no need for itearting and comparing char by char.
Regardless of the string size, this is always the same amount of operations for the CPU.

Now, there is actually a lot to unpack in this code, but I'll keep it simple here

The return type is a u64, that is OR'ed with MATCH or NOT_MATCH. The actual index is set starting 8 bits to the left

The size is padded up to match 32 size-alignment ( for simd purposes, so I don't have to do a scalar tail loop )

I use posix_memalign to align the memory of the pointer, which is also beneficial for SIMD. This is basically the index in the memory address space

Happy Coding!