Wednesday 25 November 2015

Template metaprogramming and indices trick for variadic template expansion.

Recently I was revisiting some of the code I wrote for my 2d game engine and I saw one thing in the reflection library I was developing alongside the engine that, at the time of coding it, I was not very happy about. There was a  code that was generating stringified signature of a reflected method. This was really minor issue that could easily stay the way it was but since I was already looking at it and I had this urge to try something new I decided to change they way it worked. I had this idea of a compile time string generation (with respect to some pattern) that I wanted to check if was even doable - the results are in this blogpost.

Background

The inteface of my reflection API allows you to easily access type information and from there collections of various entities related to the type like its fields, properties and methods. The API allowed you to iterate over the objects or explicitly Find one by name.

Each reflected element like those fields or methods were a named entity, therefore you could find each by name. This was fine for fields and properties but appeared to be the problem with functions with method overloading. Let's say I agreed upon not adding overloading support but it was still doable but the syntax was ugly and it was kinda hacky.

Anyway later on for debugging purposes I decided to finally add full signature support that was supposed to eventually resolve the problems with ambiguity when overloaded methods were in question. The idea was simple, add API for finding methods by signature (somewhat similar to Java's JNI but without those cryptic shortcut names for built-in types that I always remember wrong. I mean who calls boolean a Z?!).

So I went for it and ended up with something like this:

Base class por MethodInfo
    class MethodInfo : public CallableBase, public Reflectable
    {
        twin_embed_typeinfo(MethodInfo)

    public:
        // (...) ctors removed

        TW_INLINE const Type* get_class() const {
            return m_class;
        }

        TW_INLINE const Type* get_return_type() const {
            return m_return;
        }

        TW_INLINE const FixedArray& get_arguments() const {
            return *m_args;
        }

        string ToString() const
        {
            return m_signature;
        }

    protected:
        string                            m_signature;
        const Type*                       m_class;
        const Type*                       m_return;
        FixedArray<const Type*>          m_args;
    };


And this was the derived base for a template:
  template<typename C, typename R, typename... A>
    class TMethodInfoBase : public MethodInfo
    {
    public:
        TMethodInfoBase(const char* name_)
            : MethodInfo(name_, typeof<R>(), typeof<C>(), &m_args_)
        {
            details::reflection::get_argument_types(m_args_);

            m_signature = _construct_signature_string();
        }

    private:
        template<unsigned... N>
        string _to_string_format(const string& format, details::index_tuple) const
        {
            return Format(format.c_str(), m_return->GetName(),
                m_name, (m_args_[N]->GetName())...);
        }

        TFixedArray<const Type*>* m_args_;

        string _construct_signature_string()
        {
            // ------------------ This construction of format block hurt my eyes
            string format("% %(");
            for (s32 i = 0; i < (s32)(get_arguments().size() - 1); ++i)
            {
                format += "%, ";
            }
            if (get_arguments().size() > 0)
                format += "%";
            format += ");";
            // -----------------------------------------------------------------
            return _to_string_format(format, expand_indices<N>());
        }
    };

The problem

The "ugly part" was the design that actually stored each method signature in memory, which involved allocations as well as not-so-free construction of the signature string during construction of the instance (_construct_signature_string function). This was actually quite bad because you paid the cost of it even if you never ever wanted to touch the string signature for a given method. With possibly hundreds of functions it was totally unnecessary waste of memory.

What I wanted to "fix" this time were actually three things. Get rid of the memory overhead, therefore, remove the signature and construct it on-demand whenever someone wants it (this wasn't suppose to happen often, probably in debug tools or so). Moreover, replace the signature with a hash of a signature that is only 4 bytes and would allow me to still lookup by signature. Finally the last part was actually the topic of this post. Test if I can actually construct the signature without dynamic memory allocation or at least removal of the format string construction (code isolated with comment sections)

The solution 

I decided to at least try to achieve this format string generation for method signature a compile time operation by using template metaprogramming. If anyone is wondering my string class supports a Format function that tried to mimick C# format in a way but replacing the position based placeholders ({0}/{1} etc.) with more 'C' style % fashion. Since this was a template funtion that would deduce the argument types on its own it was type safe without the need to specify the type in format string. So say goodbye to those %s, %i, %f etc.

With that in mind, each member function had the following signature common: "% %( ... );" where first argument was obviously the return type and second one was the name of the function. It was ended with ");" and what was left to fill was the part between parenthesis based on the number of arguments. For each argument except the last we wanted to add three characters - %, comma and a space. Last method argument should just add %.

Knowing our requirements lets jump into the implementation.

In order to keep things simple lets show the solution that is already broken down with the comments (that, like with any decent programmer, is with 0-based indexing ;) )

// Implementation details
namespace impl
{
    struct _fn_sig_arg_pattern
    {
    protected:
        // 0.1) This is declared as static member in base class to avoid static initialization fiasco.
        static const char value[];
    };
    const char _fn_sig_arg_pattern::value[] = "%, ";

    // 0) template derived from _fn_sig_arg_pattern. Not really a is-a relationship but I dont need to instantiate
    // this "%, " string for each template instantiation
    template<unsigned N, unsigned... Idx>
    struct signature_format : _fn_sig_arg_pattern
    {
        static const char value[];
    };

    // 0.2) Initializer for the const char array. Note the expansion of variadic by indexing into the pattern array with Idx for each value in variadic pack
    template<unsigned N, unsigned... Idx>
    const char signature_format<N, Idx...>::value[] = { '%', ' ', '%', '(', (_fn_sig_arg_pattern::value[Idx])..., ')', '\0' };

    // 3) Our sequence generation templates. For each level of N it recurses decrementing the N count and appending the sequence '0, 1, 2' to sequence pack
    template<unsigned N, unsigned... Is>
    struct arg_sequence : arg_sequence<N - 1, 0, 1, 2, Is...> {};

    // 3.1) The recursion terminating partial specialization that when N reaches 0 derives from indices<> template appending 0 to the already generated sequence
    template<unsigned... Is>
    struct arg_sequence<0, Is...> : indices<Is..., 0> {};

    // When instantiated the arg_sequence object given <2> and downcasted to indices will generate object indices<0, 1, 2, 0> 
    // Given <3> will generate indices<0, 1, 2, 0, 1, 2, 0> and so on and so forth.

    // 2) The second argument of this template - the pack, will get automatically deduced from the provided argument to the function
    template<unsigned N, unsigned... Idx>
    const char* _get_signature_format(indices<Idx...> seq)
    {
        // 2.1) We expand the generated indices to the format function that will lated on use these while initializing array to fill in the gap we need to fill
        return signature_format<N, Idx...>::value;
    }
}

// 1) public API.
template<unsigned N>
const char* get_signature_format()
{
    // 1.1) This just calls the implementation function providing number of arguments and running the index sequence generator
    return impl::_get_signature_format<N>(impl::arg_sequence<N - 1>());
}

/*
    4) Example usage:

    const char* args2 = get_signature_format<2>();
    const char* args5 = get_signature_format<5>();

    template<typename... TArgs>
    void foo()
    {
        const char* fnFormat = get_signature_format<sizeof...(TArgs)>();
        auto fullSignatureString = Format(fnFormat, returnType, functionName, typeof<TArgs>().get_name()...);
    }
*/

I have no clue on how to title this post so if anyone reading this has an idea feel free to give your ideas.

EDIT: It seems worth noting, that actually C++14 has utilities for this indices expansion that work out of the box (I have overslept the C++14 and soon to be C++17 additons as, as of late, I have been busy working with C# and Unity3d). Anyways, in my case, where the sequence was not just 0..N it wouldn't help but in most cases you don't have to do these structs and derivations just to get what you need. These are defined in <utility> and are called index_sequence, make_integer_sequence and make_index_sequence. Together with tuples they are powerful tool worth toying with.

No comments:

Post a Comment