I’ve never worked with XML for more than a few seconds until recently.

A new project came up and I have to convert some data in one format into XML. The destination of the data provides an XML schema which is nice. People are very vocal in their hatred towards XML and I have certainly had my share of frustrations so far but I think there are some good ideas in there.

Sequence

The sequence tag in XSD came as quite a shock to me. Comparing the following illustrates my surprise:

{
    "person": {
        "name": {
            "first": "Andrew",
            "last": "Consroe"
        }
    }
}

vs

<person>
    <name>
        <first>Andrew</first>
        <last>Consroe</last>
    </name>
</person>

If the schema for person uses sequence as opposed to all, then the order of first and last matters! Crazy talk.

To get the same semantics in JSON, you could do the following:

{
    "person": {
        "name": [
            {"first": "Andrew"},
            {"last": "Consroe"}
        ]
    }
}

But the former enforces the nice property that there are no duplicate keys.

However, being able to specify a sequence on unique keys could be useful when specifying a wire protocol for instance. But it does complicate the common case where you don’t care what order things arrive in.

Namespaces

Namespaces are a great idea (though being tied to domains seems a bit iffy to me; domains are not forever) and I wish more data models could accomodate this concept. We commonly use the same words for different things and different words for the same things. Namespaces gives us a way to specify which name we mean.

While namespaces gives us a way to reduce ambiguity, it doesn’t really help communication between entities unless we can be sure that the email-address item in my namespace is the same as yours. This is something I would really like to get “solved” in the coming years. It lies at the heart of API versioning, compatibility, and ease of integration for both distributed systems using network API’s as well as shared libraries that get linked into your program. This pushes me to believe that it needs to be incorporated at the heart of a programming language to reap its full potential. Something like protobuf as your core data primitives.

I think there is a weak connection between newtype and namespaces just instead of reusing the same name for a different thing with a namespace, you use the same type for a different thing but are explicitly stating they are different somehow. The same desire for converting between namespaces exists for newtypes: the person/entity that declares a newtype presumably controls the constructors and conversions, so then automatic conversion should be possible by just following the types.

Running along with newtype, I’m ever curious about making its widespread use common but without pain. I think it has great utility for permitting us the ability to assert two things are different without having a full proof at hand. Structurally, any pair of strings may be equivalent, but only some of those are valid email addresses (name x domain) (and same for a bare string). Or perhaps its easier to see in functions: add and subtract both have type int -> int -> int but we humans have some notion of why those are distinct. I think this concept could even be carried so far as to give a newtype for commonly used variable names, like “needle” and “haystack”. Upgrading common names into a shared standard should give us greater ability to automatically, for example, suggest the next function you probably want to use.

I guess I’m saying that types are too coarse as they are now. We need to be able to refine them and we need to be able to do so without writing out a full proof (a la dependent typing) of how they are different. We need to collect a global and shared database of these types and we need to collect functions which convert between these types.


That’s all I got for now; mostly inchorent but maybe something useful in there.