Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0

    A Dynamic Language With Complete Static Analysis


    There was a video regarding the future of the Ruby programming language that I had watched a long time ago in which they discussed the idea of grafting a structural type system onto the language that would allow for improved static analysis and better code generation.

    Now, I've never really cared much for Ruby, but this idea intrigued me. I've developed a philosophy over the years that programming languages are a form of user interface, and that different problems or constraints require different solutions.

    For the past few years, I've been deeply studying the design and implementation of many different types of programming languages, with the goal to create a very easy to understand dynamic language that is type-safe and able to be statically proven. You could call it a 'quest for the one true programming language'. There's a good chance that I'm wasting my time, but it's fun to me, so who cares?

    Among the problems I've managed to find solutions to are implicit declaration, type inference, prevention of null-references, and error/exception handling. By "solution", I mean finding a way to write code that seems untyped, unbureaucratic, terse, and noiseless (think JavaScript or TCL) while remaining statically provable, deterministic, and safe. I won't go too much into detail, but the solutions to these problems were heavily influenced by Haskell, VB6, Rust, and Swift.

    There are a few remaining problems to be solved though.

    • Object Ownership and Lifetimes - Whilst studying Rust, one thing I can't say I fully understand yet, but catches my eye is the connection between object ownership and lifetimes. I have a strong hunch that it may be possible to infer the ownership of an object from it's use and use the ownership to statically prove the lifetime of an object. If possible, I could design the language to have high-performance automatic memory management without the latency of garbage collection.
    • Concurrency - I haven't quite figured out how I'm going to implement it yet, but I really like the concept behind goroutines in the Go programming language. It makes sense to me that context switches should happen on I/O, where blocking is already going to happen. Typed message channels are also a very good idea.
    • Namespacing - I've long been an admirer of the work of the game developer Jonathan Blow (creator of "Braid" and "The Witness") and have repeatedly watched his presentations and demos of his "Jai" programming language. In his Jai language, the one feature that stood out the most to me was his use of the "using" keyword to alias members of objects into the current scope. I believe this has a huge amount of potential, as it can be used to transform object oriented code into imperative code for the duration of a scope. That one feature aside, I'm not quite decided on how packages should work yet. Different problems require different solutions, and when I need a script that only executes 4 or 5 lines of code, it's silly to have to create a package, class, and method to do so merely out of convention.
    • Data Manipulation - A lot of the programs I write tend to revolve around analyzing, munging, and converting data. Because of this, I spent a lot of time studying parsers, AWK, SQL-based languages, and LINQ. LINQ is fairly cool, and I might do something based on it, but I haven't quite found what I'm looking for, sadly. Most of these technologies are made for processing textual data, and I'm more interested in binary data.


    If you have and ideas about how to approach the above problems, please share them
  2. #2
  3. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    Well, I decided to spark some discussion, I'd give a sneak peak at what I've been working on.


    First of all, let's talk about the type system. Before a symbol has been declared, it has a type of "Undefined". Instances of the "Undefined" type do not have a value. Trying to compare or perform any sort of arithmetic on something that is "Undefined" will result in a compiler error (This includes comparing 2 "Undefined" symbols. By definition, they have no value, methods, or properties to form a basis of comparison). The only thing that doesn't result in an error is checking if a symbol is "Undefined" or not. Such an expression is evaluated at compile time, and will result in the conditional compilation of code.

    The second type you should know about is the "Nothing" type. Every instance of the "Nothing" type will always have a value of "nothing". As such, it has a size of 0 and does not need to be initialized. It is essentially the equivalent of 'void' in C++.

    Every object is an instance of a class, where the ancestor of all classes is 'Something'. Anything that is an instance of "Something" must be initialized to a valid state. "Something" and "Nothing" are mutually exclusive. There is no such thing as a null reference in this language.

    Now we're going to step into some of the more interesting bits. There is a generic type called known as "Either", which is a tagged union of 2 or more types. For example, "Either<Something, Nothing>" is a tagged union of an instance of "Something" and an instance of "Nothing". Which one is it? You can check using the "is" operator like "if object is Something", and cast an "Either" to it's proper type. Performing a cast on an Either type makes an assertion, which if false, will result in an Exception being thrown. For example, if you cast "Either<Something, Nothing>" into "Something" when it was actually "Nothing", then you will get an Exception.

    A special case of the "Either" type where one of the types is "Something" (or a derived class of it) and the other type is "Nothing" is also referred to as a "Maybe". For example, "Maybe<FileStream>" is equivalent to "Either<FileStream, Nothing>". Because a "Maybe" can be assigned "nothing", it doesn't need to be initialized.

    Error handling works incredibly well in this type system. All errors are derived from the "Error" class. An Error does not become an Exception until you make an assertion. For example:

    Code:
    CopyFile (from: Path, to: Path) -> Maybe<Error>
    { ... }
    
    OpenFile (at: Path) -> Either<FileStream, Error>
    { ... }
    
    local fileStream = OpenFile("~/test.txt") as FileStream;
    The last line in the above example will cast the result of OpenFile to a FileStream. This asserts that the result is not an error.

    The entire standard library will use this form of error handling, so you can choose to completely opt-out of having exceptions merely by propagating the errors up the call stack or doing some sanity checking.
  4. #3
  5. Lord of the Dance
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Oct 2003
    Posts
    4,118
    Rep Power
    2010
    Interesting topic.

    I am a bit unclear on how 'Nothing' is different from 'null?

    At some point, it sounds like 'Empty' could be a better naming?
  6. #4
  7. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    Since my last post didn't work for some reason (forum glitch?) I'll go ahead and repeat it:

    The Nothing and Something types are mutually exclusive. You cannot assign 'nothing' to an instance of Something; they are completely different types. It'd be like assigning 5 to a string. This means that any user defined class (all of which will derive from Something) must be initialized to a valid value. The only way to have a null reference is by creating an instance of Maybe<Something> (aka. Either<Something, Nothing>). Assigning 'nothing' to a Maybe<Something> will change the type of the tagged union into 'Nothing'.

    Note that this is completely different from the notion of 'null' in most other languages. A value of 'nothing' is not the same thing as an object at the machine address 0x00000000. It is a unique constant that can only be assigned to an instance of 'Nothing'. If you specify that a parameter of a function has a type of "String", then the type system absolutely guarantees that what the function receives is actually a string, and not a null reference. If the caller tries to pass "nothing" or an instance of "Maybe<String>" that currently a "Nothing", the cast to a String will result in an Exception being thrown. That said, it won't guarantee that the String is not an empty string.

    Exceptions only occur when the code makes a false assertion about it's current state. Any function that could result in an error or a value being returned should return an Either of the two. Any function that could result in an error, but not a value should return a Maybe error.
  8. #5
  9. Not An Expert
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2015
    Posts
    404
    Rep Power
    3
    Originally Posted by Wajideus
    Since my last post didn't work for some reason (forum glitch?) I'll go ahead and repeat it:
    Aye, that's a glitch sorry. We have an automated filter that holds posts back in a moderation queue to protect against spam. The filter should be ignoring you because of your user status, but it may be going haywire because you have a lower post count than your back-end status would imply (i.e. you have higher privileges than someone with 10 posts would ordinarily have).

    It will probably be resolved automatically as the filter "gets to know you." She's a neurotic old girl but she's friendly enough once she knows ya

    In the mean-time I'll keep a close eye on the mod queue to make sure your posts get approved as soon as possible. Sorry for the inconvenience!
  10. #6
  11. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    Aye, that's a glitch sorry. We have an automated filter that holds posts back in a moderation queue to protect against spam. The filter should be ignoring you because of your user status, but it may be going haywire because you have a lower post count than your back-end status would imply (i.e. you have higher privileges than someone with 10 posts would ordinarily have).
    I figured as much

    Another thing I should bring up is that the language does support type inference. Not just for declaring variables like:

    Code:
    local fileStreamOrError = OpenFile("test.txt")
    
    # being equivalent to
    
    local fileStreamOrError : Either<FileStream, Error> = OpenFile("test.txt")
    But also on the return type of functions themselves. A return statement without a value is considered equivalent to 'return nothing', which has a type of 'Nothing'. If you don't explicitly state the return type of the function, the parser will make a list of all the types of the value that might be returned by the function. If there is more than 1 type, it will infer the return type of the function to be an Either of the types.

    Code:
    /* The return type of this function is Either<Error, Integer> */
    divide(x: Integer, y: Integer)
    {
        if y == 0 {
            return Error.DivideByZero;      // because this is an Error
        }
        return x / y;                           // and this is an Integer
    }
    
    // cast from Either<Error, Integer> -> Integer asserts
    // that the result of "divide" was not an Error.
    
    local result : Integer = divide(1, 2);

    ---

    EDIT #1:

    Going back to what I mentioned in the first post, I'm currently working on how to implement automatic memory management, modules, concurrency, and pattern matching. My rule of thumb is that a solution should solve at least 80% of the problems you'd face very well, but not lock you into it. For example, I decided that all instances of Something should be passed by reference (or as Java puts it, the value of Something is a reference, and everything is passed by value); because anyone who's used C++ knows that the vast majority of the time, you're working with references, not values. If you want a value type, just duplicate the object. It has the same effect.


    EDIT #2:

    I was thinking about threading today, and made some decisions regarding the design and implementation of it. The fundamental unit of work is the "Task". A "Worker" performs a Task, but can only execute one Task at a time. A Worker switching between Tasks is equivalent to the concept of coroutines or cooperative multitasking (if you're familiar with such).

    While a Worker can only execute one Task at a time, it is possible to have multiple Workers working in parallel. In this sense, a Worker is equivalent to a Thread, and having multiple Workers requires that the machine is capable of executing code in parallel (eg. having multiple cores) or implementing preemptive multitasking somehow (eg. a timed programmable interrupt that can trigger a context switch).

    A Job (name pending) is a set of one or more Tasks being executed by at least one Worker. Coworkers (Workers within the same Job) will typically share resources with each other; however, resources cannot be shared between Jobs. At best, Jobs can communicate immutable information with each other. Thus, a Job is analogous to a process, which has it's own address space and can only communicate with other processes via pipes or sockets.
    Last edited by Wajideus; April 14th, 2017 at 06:08 PM.
  12. #7
  13. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    Alrighty, so, another small update regarding the language syntax.

    Firstly, the division operator of all numeric types will return an Either of an Error and the type. For example, the result of '5 / 5' has a type of Either<Error, Integer>. This works together with typecast assertion, such that an implicit cast from Either<Error, Integer> -> Integer where the result was an Error (DividedByZero); an exception will be thrown. The primary reason for this decision has to do with preventing a kind of error that typically requires hardware support for traps.

    Secondly, the language will be integrate a dynamic type system onto it's existing static type system. The way that it works is that the default parameter type for functions is a `Variant`; and if any return statement within a function returns a Variant, the return type of the function will default to a variant. Thus:

    Code:
    on add (x, y) {
        get x + y;
    }
    will be inferred as:

    Code:
    on add (x: Variant, y: Variant) -> Variant {
        get x + y;
    }
    Note that this only applies to type inference. We could explicitly state the return type:

    Code:
    on add(x, y) -> Integer {
        get x + y;
    }
    In this case, the expression (x + y), which evaluates to a Variant since 'x' and 'y' are Variants, will be typecast asserted into an Integer at runtime.

    The most fascinating thing about this is that you can choose when and where you want static typing vs dynamic typing. The two type systems interoperate seamlessly. When you need a robust and high-performance component in your system, use static typing. When you just want to glue some components together or prototype something quickly with iterative testing, use dynamic typing. You can have the best of both worlds.
    Last edited by Wajideus; April 16th, 2017 at 02:52 AM.
  14. #8
  15. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    This is just a bit of a syntactical revision:


    Code:
    InputStream Class<T>
    {
        Final Operator >> T( )
        {
            value = new T[1];
            get_T_Array(value);
        }
        get_T_Array (values T[ ]);
    }
    
    FileInputStream Class : InputStream<Byte>
    {
        Static Open Either<FileInputStream, Error>(filePath String);
        close ();
    }
    One interesting aspect to mention here is that the underscore is not part of a symbol name; it's a symbol concatenator. So the get_T_Array method of an InputStream<Byte> is actually evaluated as "getByteArray".
  16. #9
  17. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    So, this is just a neat idea I came up with for how to handle line continuation in a language without rely on escape sequences or statement-terminating characters like semicolons:


    Condition:
    • Start with 2 lines of code.
    • If the indentation of any line contains tabs, all of the tabs must precede any spaces


    Rules:
    1. If the first line opens a new block, then there is no line continuation. [DONE]
    2. If the number of tabs is the same in both lines, the indentation level for both lines is equal to the number of spaces. [GOTO 5]
    3. If the number of spaces is the same in both lines, the indentation level for both lines is equal to the number of tabs. [GOTO 5]
    4. For both lines, add the number of spaces to the number of tabs multiplied by the number of spaces in the line that has the most spaces. The result is the indentation level. [GOTO 5]
    5. If the first line has an indentation level that's greater than or equal to the indentation level of the second line, then there is no line continuation. [DONE]
    6. If the attempted line continuation results in a syntax error, then the lines should be parsed again separately.


    This idea basically works on the assumption that a continuation line will always be indented more than the line that it continues, that people won't mix tabs and spaces in dumb ways (like: tab-space-tab-tab-space), and that the number of spaces used will mostly likely never be greater than the tab size.

    Note that this idea doesn't use indentation to determine scope like python does. All it does is give a very accurate way of determining when one line is continuing another line.
  18. #10
  19. Contributing User
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Aug 2011
    Posts
    5,805
    Rep Power
    508
    python indentation works fine. Line continuations using open parenthesis are best python line continuation.
    [code]Code tags[/code] are essential for python code and Makefiles!
  20. #11
  21. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    I don't personally have anything against Python-style indentation, though having an "end" keyword makes it a little easier to identify the end of a block when you have several levels of nesting. What I was talking about doesn't have to do with detecting what block a statement is a part of though; it has to do with detecting a line continuation (as opposed to using semicolons to terminate statements or using an escape character like \ or _ ). It would actually be much easier to implement this idea in Python because the language already doesn't let you mix tabs and spaces; which is where much of the complexity comes in.

    I'm not sure if I'm actually going to implement this idea in my language yet. The syntax doesn't matter as much to me as the features do. I was just thinking about how I dislike having a lot of noisy programs, and it's quite silly to have to type a semicolon on every single statement when most statements don't span multiple lines. Most people who argue against semicolon insertion seem to think that the way that JavaScript does it is the only way it can be done for some reason.
  22. #12
  23. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    So, here's a new idea I've been working on lately; replacing structs with sets.

    The only rule regarding the construction of a set is that each member of the set must have a type. Names and values are completely optional.
    Code:
    args := (:int, :[*char]);
    The above defines a valid set called 'args', where the first member is an int and the second member is a list of pointers to chars.

    Each member of a set can be accessed by index. For example, `args[2]` would be the second member of the set called 'args'. If a member of a set has a name, the name comes before the colon.
    Code:
    args := (argc:int, argv:[*char]);
    Named members can be accessed via the '.' member operator just like you would a struct. eg. `args.argc`.

    Lastly, sets can be constructed from values via type-inference.
    Code:
    oddity := (11, "hello world");
    Here, the members of the set do not use colons. This means that they're values and the types are inferred such that in actuality, the code is equivalent to:
    Code:
    oddity := (:=11, :="hello world");
    which is equivalent to:
    Code:
    oddity := (:int=11, :str="hello world");
    This means that sets can also be used as tuples. Note the parallelism between declaration inside and outside of a set. This new syntax means that braces are used exclusively only for control-flow and not for declaring / assigning lists and sets in the language. It also unifies structs with parameter lists for functions, which I personally think helps to clean up lambda syntax a bit.


    EDIT:

    I forgot to mention, I changed the pointer and array syntax a little bit. When declaring a pointer, the asterisk comes before the typename. eg.
    Code:
    str: *char;
    Declares a variable called 'str' which is a pointer to a sequence of chars. Additionally, the types of lists are placed inside of the brackets.
    Code:
    people: [person];
    The above declares a variable called 'people' which is a list of 'person' items. By default, lists are dynamically allocated, akin to having a counter/pointer tuple that's automatically managed by a user-specified allocator. If you want a list with a specific number of elements, you place the number of elements in parentheses after the list:
    Code:
    people: [person](4);
    Lists that are declared like this have a fixed size and are thus equivalent to C arrays. Another thing to note is that the (4) is in fact, a set. What we're actually doing is calling the list constructor which has a parameter set of: `(allocator:=default_allocator, count:int=0)`.
    Last edited by Wajideus; June 26th, 2017 at 02:53 PM.
  24. #13
  25. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    This is just a peculiar idea I had earlier today. Assignment statements are really nothing more than a statement of equality when you think about it. So what about statements of inequality? Consider the following truth tables:

    For (x /= y):
    • x = anything # false, undeterminable
    • x /= (anything = y) # true
    • x /= (anything /= y) # false, undeterminable
    • x < anything # false, undeterminable
    • x <= anything # false, undeterminable
    • x > anything # false, undeterminable
    • x >= anything # false, undeterminable


    For (x < y):
    • x = anything # false, undeterminable
    • x /= (anything >= y) # true
    • x /= (anything < y) # false, undeterminable
    • x < (anything <= y) # true
    • x < (anything > y) # false
    • x <= (anything <= y) # true
    • x <= (anything > y) # false
    • x > anything # false, undeterminable
    • x >= anything # false, undeterminable


    For (x > y):
    • x = anything # false, undeterminable
    • x /= (anything <= y) # true
    • x /= (anything > y) # false, undeterminable
    • x < anything # false, undeterminable
    • x <= anything # false, undeterminable
    • x > (anything >= y) # true
    • x > (anything < y) # false
    • x >= (anything >= y) # true
    • x >= (anything < y) # false


    A potential application for this feature would be mathematical correctness. For example, if you limit yourself to only having statements of equality (traditional assignment), then your answers will be blatantly wrong when a register overflow or underflow occurs because the machine cannot calculate or represent the exact value. But, with inequality, you can do something like:
    Code:
    return > int32.max;
    Which doesn't give you an exact answer, but does give you a correct answer.

    I'm not saying I'm going to implement this in my language, but it is interesting and could prevent a huge class of errors that are extremely common in all programming languages.
  26. #14
  27. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    I've been toying with yet another idea today; which is basically that all identifiers are types. For example:

    Code:
    idiv := (dividend:int, divisor:int) -> (quotient:int, remainder:int);
    In the above function declaration, one might typically read this as though the arguments and the results are all integers. However, this is not necessarily true, because the divisor cannot be 0 but an integer can.

    What we're subliminally implying here is that dividend, divisor, quotient, and remainder are all subtypes of integers. We could have instead defined 4 constructor functions (for dividend, divisor, quotient, and remainder respectively) and then written our function declaration as:

    Code:
    idiv := (dividend, divisor) -> (quotient, remainder);
    It's less typing, makes more sense, and prevents propagation of runtime errors due to malformed input. the arguments and results could be given to the body of the function as arrays in cases where there's more than one argument or result with the same type. eg.

    Code:
    moveTowards: (point, point, distance)
    {
        vector := point[1] - point[2];        // subtype the result of point[1] - point[2]
        direction := vector.normalized;       // subtype the result of vector.normalized
        distance = (distance, vector.magnitude).min;
        point[2] += direction * distance;
    }
    Last edited by Wajideus; July 2nd, 2017 at 05:12 PM.
  28. #15
  29. Nihilistic Nerd
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    23
    Rep Power
    0
    So, I came up with a new type-classification for functions today which I'm about 90% sure will actually end up in the final version of this language.


    First, you have regular functions, which are not associated with types (aside from the types of their arguments).

    Second, you have member functions, which are bound to a type via an implicit first argument called 'this'. There are 2 subtypes of member functions: operators and iterators.

    An operator is essentially the equivalent of the traditional object method. When you see `x + 5` or `object.destroy()`, the '+' and 'destroy' symbols are operators of `x` and `object`.

    An iterator is similar to an operator, but instantiates a continuation that allows it to yield multiple values; and can be used to iterate across infinite sequences.


    I've been leaning in the direction of a more Go-like syntax, which would represent these things with 'func', 'oper', and 'iter' keywords respectively:

    Code:
    struct Procedure
    {
        // overload the () operator
        oper ()
        {
            puts("you called me!");
        }
    }
    
    interf Array<V>
    {
        oper [index int] V { get; set }
        oper append(value V);
        iter values V;
    }
    
    func main()
    {
        var proc Procedure;
        var arr Array<int>;
        proc();       // "you called me!"
        arr.append(1);
        arr.append(2);
        arr.append(3);
        for (var value : arr.values)
        {
            puts(value);
        }
    The cool thing about this distinction between functions, operators, and iterators is that it makes the transition back and forth between object-oriented and functional code a lot smoother, and the type system won't allow you to implicitly cast one kind of function into another.

    I've also been working on developing a form of RTTI that allows for reflection and code generation (which is related to one of my other topics entitled "Object File Format's Suck"); but I haven't quite worked everything out yet. The general idea is that you'll be able to load and store dynamically generated native code to and from disk.
    Last edited by Wajideus; July 10th, 2017 at 11:43 PM.
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo