Flattening JSON in JSON.NET

Posted by Matthew Watkins on April 3, 2018

The use case

I’ve got an interesting problem at work where I need to take any arbitrary JSON blob (object or array) and represent the leaf nodes in memory as a collection of key/value pairs. For example, given this JSON:

[
  {
    "Name": "Fish",
    "Color": "Silver",
    "Attributes": [
      {
        "Name": "Environment",
        "Value": "Aquatic"
      },
      {
        "Name": "Parts",
        "Value": [
          {
            "Type": "fin",
            "Length": 3
          }
        ]
      }
    ]
  }
]

I want to see something like this output:

[0].Name = Fish
[0].Color = Silver
[0].Attributes[0].Name = Environment
[0].Attributes[0].Value = Aquatic
[0].Attributes[1].Name = Parts
[0].Attributes[1].Value[0].Type = fin
[0].Attributes[1].Value[0].Length = 3

Here’s what I did, and the lessons I learned

Lesson 1: Not everything is on StackOverflow

“This sounds like a pretty common use case,” I said to myself, “surely there is something on the documentation or StackOverflow.”

Nope. I searched StackOverflow for quite a while, and while I found a few answers referring to Java libraries, I couldn’t find one for the Json.NET library we are using here. The most popular NuGet package on the internet and no one has ever faced this issue before? Seriously?!

It took a few hours and lots of debugging, but I eventually wrote an extension method to allow me to grab the leaf node values of arbitrary JSON:

    public static class JExtensions
    {
        public static IEnumerable<JValue> GetLeafValues(this JToken jToken)
        {
            if (jToken is JValue jValue)
            {
                yield return jValue;
            }
            else if (jToken is JArray jArray)
            {
                foreach (var result in GetLeafValuesFromJArray(jArray))
                {
                    yield return result;
                }
            }
            else if (jToken is JProperty jProperty)
            {
                foreach (var result in GetLeafValuesFromJProperty(jProperty))
                {
                    yield return result;
                }
            }
            else if (jToken is JObject jObject)
            {
                foreach (var result in GetLeafValuesFromJObject(jObject))
                {
                    yield return result;
                }
            }
        }
        
        #region Private helpers
    
        static IEnumerable<JValue> GetLeafValuesFromJArray(JArray jArray)
        {
            for (var i = 0; i < jArray.Count; i++)
            {
                foreach (var result in GetLeafValues(jArray[i]))
                {
                    yield return result;
                }
            }
        }
    
        static IEnumerable<JValue> GetLeafValuesFromJProperty(JProperty jProperty)
        {
            foreach (var result in GetLeafValues(jProperty.Value))
            {
                yield return result;
            }
        }
    
        static IEnumerable<JValue> GetLeafValuesFromJObject(JObject jObject)
        {
            foreach (var jToken in jObject.Children())
            {
                foreach (var result in GetLeafValues(jToken))
                {
                    yield return result;
                }
            }
        }
        
        #endregion
    }

Then in my calling code, I just extract the Path and Value properties from the JValue objects returned:

    var jToken = JToken.parse("blah blah json here");
    foreach (var jValue in jToken.GetLeafValues()
    {
        Console.WriteLine("{jValue.Path} = {jValue.Value}");
    }

Awesome!

Lesson 2: But it’s always on StackOverflow

So it turns out there is an answer on StackOverflow for this use case (link). I was searching for terms like “get all leaf nodes” or “get all values with paths,” but the magic keyword to make the answer appear is “flatten.” Here’s the answer code that was posted:

JObject jsonObject=JObject.Parse(theJsonString);
IEnumerable<JToken> jTokens = jsonObject.Descendants().Where(p => p.Count() == 0);
Dictionary<string, string> results = jTokens.Aggregate(new Dictionary<string, string>(), (properties, jToken) =>
{
    properties.Add(jToken.Path, jToken.ToString());
    return properties;
});

Lesson 3: But you can’t always just copy what’s on StackOverflow

Wow, that code snippet is a lot shorter than my solution, so I tried it out. But ultimately went back to my own. Here’s why:

  1. This solution doesn’t work, at least not for my case. See, I need it to handle an arbitrary JSON blob. I can’t promise it’s going to be a JObect– it could be an array or something else, so this solution, unfortunately, fails for me out the gate with my first test case (an array). And JToken doesn’t have a handy little Descendants() method I can call like JObject does, so I’d have to do some type checking anyway. Yuck.
  2. Another problem: this solution builds a dictionary in memory to represent the flattened structure. I’m dealing with some pretty massive objects, and it’s already painful enough to load up that initial JToken. I’d really rather not add the memory pressure of the dictionary on top of that.
  3. Speaking of memory, I’d like to (eventually) only return the JValue if it’s not null or default for the value type.
  4. That .Count() looks really expensive since it’s a method being called on every single descendant, whether you end up using the descendant at all. Probably safer to just select only descendants that you know are JValue objects: .Descendants().OfType<JValue>(). Then you can call .Value. And when you have a JValue object, you can call .Value and get the underlying primitive (or pseudo-primitive string) value without calling the .ToString().

This post first appeared on Another Dev Blog