#1
  1. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    15
    Rep Power
    0

    A Dynamic Language With Complete Static Analysis


    There was a video regarding the future of the Ruby programming language that I had watched a long time ago in which they discussed the idea of grafting a structural type system onto the language that would allow for improved static analysis and better code generation.

    Now, I've never really cared much for Ruby, but this idea intrigued me. I've developed a philosophy over the years that programming languages are a form of user interface, and that different problems or constraints require different solutions.

    For the past few years, I've been deeply studying the design and implementation of many different types of programming languages, with the goal to create a very easy to understand dynamic language that is type-safe and able to be statically proven. You could call it a 'quest for the one true programming language'. There's a good chance that I'm wasting my time, but it's fun to me, so who cares?

    Among the problems I've managed to find solutions to are implicit declaration, type inference, prevention of null-references, and error/exception handling. By "solution", I mean finding a way to write code that seems untyped, unbureaucratic, terse, and noiseless (think JavaScript or TCL) while remaining statically provable, deterministic, and safe. I won't go too much into detail, but the solutions to these problems were heavily influenced by Haskell, VB6, Rust, and Swift.

    There are a few remaining problems to be solved though.

    • Object Ownership and Lifetimes - Whilst studying Rust, one thing I can't say I fully understand yet, but catches my eye is the connection between object ownership and lifetimes. I have a strong hunch that it may be possible to infer the ownership of an object from it's use and use the ownership to statically prove the lifetime of an object. If possible, I could design the language to have high-performance automatic memory management without the latency of garbage collection.
    • Concurrency - I haven't quite figured out how I'm going to implement it yet, but I really like the concept behind goroutines in the Go programming language. It makes sense to me that context switches should happen on I/O, where blocking is already going to happen. Typed message channels are also a very good idea.
    • Namespacing - I've long been an admirer of the work of the game developer Jonathan Blow (creator of "Braid" and "The Witness") and have repeatedly watched his presentations and demos of his "Jai" programming language. In his Jai language, the one feature that stood out the most to me was his use of the "using" keyword to alias members of objects into the current scope. I believe this has a huge amount of potential, as it can be used to transform object oriented code into imperative code for the duration of a scope. That one feature aside, I'm not quite decided on how packages should work yet. Different problems require different solutions, and when I need a script that only executes 4 or 5 lines of code, it's silly to have to create a package, class, and method to do so merely out of convention.
    • Data Manipulation - A lot of the programs I write tend to revolve around analyzing, munging, and converting data. Because of this, I spent a lot of time studying parsers, AWK, SQL-based languages, and LINQ. LINQ is fairly cool, and I might do something based on it, but I haven't quite found what I'm looking for, sadly. Most of these technologies are made for processing textual data, and I'm more interested in binary data.


    If you have and ideas about how to approach the above problems, please share them
  2. #2
  3. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    15
    Rep Power
    0
    Well, I decided to spark some discussion, I'd give a sneak peak at what I've been working on.


    First of all, let's talk about the type system. Before a symbol has been declared, it has a type of "Undefined". Instances of the "Undefined" type do not have a value. Trying to compare or perform any sort of arithmetic on something that is "Undefined" will result in a compiler error (This includes comparing 2 "Undefined" symbols. By definition, they have no value, methods, or properties to form a basis of comparison). The only thing that doesn't result in an error is checking if a symbol is "Undefined" or not. Such an expression is evaluated at compile time, and will result in the conditional compilation of code.

    The second type you should know about is the "Nothing" type. Every instance of the "Nothing" type will always have a value of "nothing". As such, it has a size of 0 and does not need to be initialized. It is essentially the equivalent of 'void' in C++.

    Every object is an instance of a class, where the ancestor of all classes is 'Something'. Anything that is an instance of "Something" must be initialized to a valid state. "Something" and "Nothing" are mutually exclusive. There is no such thing as a null reference in this language.

    Now we're going to step into some of the more interesting bits. There is a generic type called known as "Either", which is a tagged union of 2 or more types. For example, "Either<Something, Nothing>" is a tagged union of an instance of "Something" and an instance of "Nothing". Which one is it? You can check using the "is" operator like "if object is Something", and cast an "Either" to it's proper type. Performing a cast on an Either type makes an assertion, which if false, will result in an Exception being thrown. For example, if you cast "Either<Something, Nothing>" into "Something" when it was actually "Nothing", then you will get an Exception.

    A special case of the "Either" type where one of the types is "Something" (or a derived class of it) and the other type is "Nothing" is also referred to as a "Maybe". For example, "Maybe<FileStream>" is equivalent to "Either<FileStream, Nothing>". Because a "Maybe" can be assigned "nothing", it doesn't need to be initialized.

    Error handling works incredibly well in this type system. All errors are derived from the "Error" class. An Error does not become an Exception until you make an assertion. For example:

    Code:
    CopyFile (from: Path, to: Path) -> Maybe<Error>
    { ... }
    
    OpenFile (at: Path) -> Either<FileStream, Error>
    { ... }
    
    local fileStream = OpenFile("~/test.txt") as FileStream;
    The last line in the above example will cast the result of OpenFile to a FileStream. This asserts that the result is not an error.

    The entire standard library will use this form of error handling, so you can choose to completely opt-out of having exceptions merely by propagating the errors up the call stack or doing some sanity checking.
  4. #3
  5. Lord of the Dance
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Oct 2003
    Posts
    4,109
    Rep Power
    2010
    Interesting topic.

    I am a bit unclear on how 'Nothing' is different from 'null?

    At some point, it sounds like 'Empty' could be a better naming?
  6. #4
  7. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    15
    Rep Power
    0
    Since my last post didn't work for some reason (forum glitch?) I'll go ahead and repeat it:

    The Nothing and Something types are mutually exclusive. You cannot assign 'nothing' to an instance of Something; they are completely different types. It'd be like assigning 5 to a string. This means that any user defined class (all of which will derive from Something) must be initialized to a valid value. The only way to have a null reference is by creating an instance of Maybe<Something> (aka. Either<Something, Nothing>). Assigning 'nothing' to a Maybe<Something> will change the type of the tagged union into 'Nothing'.

    Note that this is completely different from the notion of 'null' in most other languages. A value of 'nothing' is not the same thing as an object at the machine address 0x00000000. It is a unique constant that can only be assigned to an instance of 'Nothing'. If you specify that a parameter of a function has a type of "String", then the type system absolutely guarantees that what the function receives is actually a string, and not a null reference. If the caller tries to pass "nothing" or an instance of "Maybe<String>" that currently a "Nothing", the cast to a String will result in an Exception being thrown. That said, it won't guarantee that the String is not an empty string.

    Exceptions only occur when the code makes a false assertion about it's current state. Any function that could result in an error or a value being returned should return an Either of the two. Any function that could result in an error, but not a value should return a Maybe error.
  8. #5
  9. Not An Expert
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2015
    Posts
    404
    Rep Power
    3
    Originally Posted by Wajideus
    Since my last post didn't work for some reason (forum glitch?) I'll go ahead and repeat it:
    Aye, that's a glitch sorry. We have an automated filter that holds posts back in a moderation queue to protect against spam. The filter should be ignoring you because of your user status, but it may be going haywire because you have a lower post count than your back-end status would imply (i.e. you have higher privileges than someone with 10 posts would ordinarily have).

    It will probably be resolved automatically as the filter "gets to know you." She's a neurotic old girl but she's friendly enough once she knows ya

    In the mean-time I'll keep a close eye on the mod queue to make sure your posts get approved as soon as possible. Sorry for the inconvenience!
  10. #6
  11. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    15
    Rep Power
    0
    Aye, that's a glitch sorry. We have an automated filter that holds posts back in a moderation queue to protect against spam. The filter should be ignoring you because of your user status, but it may be going haywire because you have a lower post count than your back-end status would imply (i.e. you have higher privileges than someone with 10 posts would ordinarily have).
    I figured as much

    Another thing I should bring up is that the language does support type inference. Not just for declaring variables like:

    Code:
    local fileStreamOrError = OpenFile("test.txt")
    
    # being equivalent to
    
    local fileStreamOrError : Either<FileStream, Error> = OpenFile("test.txt")
    But also on the return type of functions themselves. A return statement without a value is considered equivalent to 'return nothing', which has a type of 'Nothing'. If you don't explicitly state the return type of the function, the parser will make a list of all the types of the value that might be returned by the function. If there is more than 1 type, it will infer the return type of the function to be an Either of the types.

    Code:
    /* The return type of this function is Either<Error, Integer> */
    divide(x: Integer, y: Integer)
    {
        if y == 0 {
            return Error.DivideByZero;      // because this is an Error
        }
        return x / y;                           // and this is an Integer
    }
    
    // cast from Either<Error, Integer> -> Integer asserts
    // that the result of "divide" was not an Error.
    
    local result : Integer = divide(1, 2);

    ---

    EDIT #1:

    Going back to what I mentioned in the first post, I'm currently working on how to implement automatic memory management, modules, concurrency, and pattern matching. My rule of thumb is that a solution should solve at least 80% of the problems you'd face very well, but not lock you into it. For example, I decided that all instances of Something should be passed by reference (or as Java puts it, the value of Something is a reference, and everything is passed by value); because anyone who's used C++ knows that the vast majority of the time, you're working with references, not values. If you want a value type, just duplicate the object. It has the same effect.


    EDIT #2:

    I was thinking about threading today, and made some decisions regarding the design and implementation of it. The fundamental unit of work is the "Task". A "Worker" performs a Task, but can only execute one Task at a time. A Worker switching between Tasks is equivalent to the concept of coroutines or cooperative multitasking (if you're familiar with such).

    While a Worker can only execute one Task at a time, it is possible to have multiple Workers working in parallel. In this sense, a Worker is equivalent to a Thread, and having multiple Workers requires that the machine is capable of executing code in parallel (eg. having multiple cores) or implementing preemptive multitasking somehow (eg. a timed programmable interrupt that can trigger a context switch).

    A Job (name pending) is a set of one or more Tasks being executed by at least one Worker. Coworkers (Workers within the same Job) will typically share resources with each other; however, resources cannot be shared between Jobs. At best, Jobs can communicate immutable information with each other. Thus, a Job is analogous to a process, which has it's own address space and can only communicate with other processes via pipes or sockets.
    Last edited by Wajideus; April 14th, 2017 at 06:08 PM.
  12. #7
  13. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    15
    Rep Power
    0
    Alrighty, so, another small update regarding the language syntax.

    Firstly, the division operator of all numeric types will return an Either of an Error and the type. For example, the result of '5 / 5' has a type of Either<Error, Integer>. This works together with typecast assertion, such that an implicit cast from Either<Error, Integer> -> Integer where the result was an Error (DividedByZero); an exception will be thrown. The primary reason for this decision has to do with preventing a kind of error that typically requires hardware support for traps.

    Secondly, the language will be integrate a dynamic type system onto it's existing static type system. The way that it works is that the default parameter type for functions is a `Variant`; and if any return statement within a function returns a Variant, the return type of the function will default to a variant. Thus:

    Code:
    on add (x, y) {
        get x + y;
    }
    will be inferred as:

    Code:
    on add (x: Variant, y: Variant) -> Variant {
        get x + y;
    }
    Note that this only applies to type inference. We could explicitly state the return type:

    Code:
    on add(x, y) -> Integer {
        get x + y;
    }
    In this case, the expression (x + y), which evaluates to a Variant since 'x' and 'y' are Variants, will be typecast asserted into an Integer at runtime.

    The most fascinating thing about this is that you can choose when and where you want static typing vs dynamic typing. The two type systems interoperate seamlessly. When you need a robust and high-performance component in your system, use static typing. When you just want to glue some components together or prototype something quickly with iterative testing, use dynamic typing. You can have the best of both worlds.
    Last edited by Wajideus; April 16th, 2017 at 02:52 AM.
  14. #8
  15. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    15
    Rep Power
    0
    This is just a bit of a syntactical revision:


    Code:
    InputStream Class<T>
    {
        Final Operator >> T( )
        {
            value = new T[1];
            get_T_Array(value);
        }
        get_T_Array (values T[ ]);
    }
    
    FileInputStream Class : InputStream<Byte>
    {
        Static Open Either<FileInputStream, Error>(filePath String);
        close ();
    }
    One interesting aspect to mention here is that the underscore is not part of a symbol name; it's a symbol concatenator. So the get_T_Array method of an InputStream<Byte> is actually evaluated as "getByteArray".

IMN logo majestic logo threadwatch logo seochat tools logo