Structured data in Hugo templates

Structured data in Hugo templates

August 06, 2022 5 minutes reading time Development go

Within a website, structured data is used to help search engines understand the context of that information better. It’s organized and tagged with specific groups of text. You can find a good introduction in the Google’s Search Central documentation. This is a personal blog, so I want to include information about me as a person.

This blog is built with the static site generator Hugo, the world’s fastest framework for building websites - according to them. I did not find an existing Hugo theme I liked, so I created one myself (drop me a note if you are interested). My theme now needs to have a partial: it’s Hugo’s name for a reusable fragment with optional logic, that can be embedded in a web page. This partial should include my personal data in the correct format.

There is a specific format for personal data, it’s described in the person schema. Note that this schema includes references to other schemas. If you work for a company, you can use the organization schema to provide more information with an worksFor entry. I’ve selected the fields that I found most relevant and ended up with the following JSON structure:

{
  @context:              'https://schema.org',
  @type:                 'Person',
  name:                  'John Doe',
  familyName:            'Doe',
  givenName:             'John',
  image:                 'https://example.org/john_doe.webp',
  telephone:             '+491231234567',
  email:                 'john.doe@example.org',
  url:                   'https://john_doe.example.org/',
  vatID:                 'DE123456789',
  knowsLanguage:         ['de-DE', 'en-US'],
  address: {
    @type:               'PostalAddress',
    streetAddress:       'Am Hauptbahnhof',
    postalCode:          '60329',
    addressLocality:     'Frankfurt am Main',
    addressRegion:       'Hesse',
    addressCountry: {
      @type:             'Country',
      name:              'DE'
    }
  }
}

Since a theme should be like a blueprint, that displays not only my personal data, but the data of every person who might use it, I separated the actual data from the presentation. The config.toml is an ideal place to store some configuration parameters. I created a new section with the name params.person_ld - ld stands for linked data - that contains all the personal data and that can be changed easily.

[params.person_ld]
  person_name            = "John Doe"
  person_family_name     = "Doe"
  person_given_name      = "John"
  person_image           = "https://example.org/john_doe.webp"
  person_telephone       = "+491231234567"
  person_email           = "john.doe@example.org"
  person_url             = "https://john_doe.example.org/"
  person_vatid           = "DE123456789"
  person_knows_language  = ["de-DE", "en-US"]
  address_street         = "Am Hauptbahnhof"
  address_postal_code    = "60329"
  address_locality       = "Frankfurt am Main"
  address_region         = "Hesse"
  address_country        = "DE"

On to the partial. The structured data is embedded in a <script> tag in the web page in JSON format and has a special Mime type: application/ld+json. Since some configuration values can contain characters that need escaping in HTML, I had to convince Hugo/Go that the values can be safely rendered as-is with safeHTML.

<script type="application/ld+json">
{
  "@context":          "https://schema.org",
  "@type":             "Person",
  "url":               "{{ safeHTML .Site.Params.Person_ld.Person_url }}",
  <!-- More linked data in JSON format goes here -->
}
</script>

The problem

Contrary to what I had expected, this didn’t work at all. Even though I had used safeHTML, some portions of the generated output still contained escaped characters. That was not what I wanted. I wanted the strings from the configuration to be shown as they were. Instead, I ended up with this (examples, expected output in parentheses):

https:\/\/example.org\/john_doe.webp   (https://example.org/john_doe.webp)
\u002b49123123456                      (+491231234567)

The solution

After intensive Internet research and some testing of my own, it turned out, that the <script> tag was preventing the Hugo/Go rendering engine from allowing to unescape certain parts, leading to the escaped characters, no matter what I tried.

I know that it’s unsafe to allow this in the context of rendering arbitrary input, potentially from unsafe sources - like user input. But that’s not the case here: all content comes from the config.toml file. Normally, there is no chance of web user input reaching this place.

So I tried a solution for the partial person-linked-data.html that has no explicit <script> element anymore. Instead, the element is printed as plain text, so that the Hugo/Go parser doesn’t know for sure, that it is inside a <script> block:

{{ safeHTML "<script type=\"application/ld+json\">" }}
{
  "@context":          "https://schema.org",
  "@type":             "Person",
  "url":               "{{ safeHTML .Site.Params.Person_ld.Person_url }}",
  <!-- More linked data in JSON format goes here -->
}
{{ safeHTML "</script>" }}

While this works, it doesn’t feel right. I really do want a valid <script> element in my partial. So the only other working solution I found was in this post by Sanmay Joshi. His solution uses the printf function. You have to be aware that the surrounding double quotes are added automatically, so you have to omit them in the partial:

<script type="application/ld+json">
{
  "@context":          "https://schema.org",
  "@type":             "Person",
  "url":               {{ printf "%s" .Site.Params.Person_ld.Person_url }},
  <!-- More linked data in JSON format goes here -->
}
</script>

I made sure the partial is only used if the configuration section params.person_ld exists. For my theme, the following lines are included at the end of the footer.html partial:

{{ if .Site.Params.Person_ld }}
  <!-- Display linked structured data -->
  {{ partial "person-linked-data.html" . }}
{{ end }}

Conclusion

This solved the problem for me. I hoped for a more elegant solution, maybe that safeHTML was working even under these circumstances. But well, that wasn’t the case …

For a blog, there is another schema that is also very useful and should be used to provide information about the articles. Have a look at blog posting schema for more information.

If you know a better or more elegant solution to the problem, I would love to hear from you!