Hi, I'm Pedro Pimenta freelance graphic designer and front-end developer.

Be careful when copying and pasting

12 October 2016

At my job (and probably yours) we're always copying and pasting stuff. Be it little snippets, big chunks of code from old projects or from the web (of course you never do that), or just plain client content, we're doing it. This post focus mainly on copying content because that's when you're copying from different sources, programs and interfaces but it applies to everything.

Where it began

On one bug-fixing morning I got this screenshot, saying there was a bug with the word "código" on Firefox:

First ocurrence of the bug

As obvious and visible as a bug can be, I didn't see it. I opened the faulty page on my machine with Chromium: it looked OK. Weird. I opened Chrome, Safari and Firefox and it only ocurred on Firefox. I looked at the code on Sublime Text and it look fine: código. Wow.

So I got stuck looking at it, inspecting it with all the browser's tools I can, couldn't find any lead. I wrote an ó beside the word código and it looked fine. It was definitely a problem with that specific ó. I could delete the word and move on, but no, I need to get to the bottom of this.

So I copied the word from Sublime Text and searched the web for "translate unicode" and "copy characters reveal unicode" (you can see i was very lost on this) and I was brought to a couple of pages that helped.

One is r12a's Unicode code converter which converted the copied ó to ó. This is two characters, not one as intended. The other page is Grant McLean's Unicode Character Finder which show this when I pasted the culprit character:

The Unicode Character Finder show a "Combining acute accent"

It forgo the first "o" because when pasting in this box, it only shows the last character. Definitely two characters. How can this be? I don't know.

Suddenly it hit me

This is client text, I copied this from somewhere. I can't recall correctly but I think this particular text was fast-forwarded from a client email. So it is weird character handling from either my end or the client's end or the man in the middle. And only Firefox show it incorrectly. Weird.

Note: it seems that some fonts handle these characters in different ways. When writing this post I noticed that he font face I'm using on Sublime handles these two characters as the one it should be, but if I change it to, for example, Inconsolata, it shows up different:

Differences between Courier and Inconsolata

This is because Inconsolata doesn't have this character in its table so it switches to the default one.

Contact

My entire name is Pedro da Gama Lança Guerra Pimenta and my email is pedro@pimenta.co.
You can find me on Twitter, LinkedIn, Dribbble, Github, Designer News and Hacker News.