helencousins.com

Excel's Gene Naming Dilemma: When Science Meets Software Quirks

Written on

Chapter 1: The Love-Hate Relationship with Excel

Creating spreadsheets is a passion of mine. I enjoy organizing neat columns of figures and crafting formulas to manipulate them. It's a delightful blend of coding and note-taking. I maintain sheets for my finances, various projects, vacations, and even hobbies. There’s even a spreadsheet cataloging the items in my loft. My New Year’s resolutions? They’re neatly filed away in a spreadsheet too. Whenever I begin to contemplate a new idea, I instinctively open a sheet to arrange my thoughts into structured rows and columns. As the saying goes, "If all you have is a spreadsheet, everything appears as a cell" (a twist on a quote from Abraham Maslow).

Using Excel for any extended period teaches you its quirks. Enter a phone number, and it might transform into something like 8.E+09, with the first zero often disappearing. Numbers can morph into dates, and dates into numbers. I’ve grown accustomed to seeing #N/A errors.

While these issues are frustrating, they’re manageable. However, for geneticists, such problems can be significant. Inputting most gene names into Excel is straightforward—take "Myosin regulatory light chain interacting protein," abbreviated as MYLIP. But enter "Membrane-associated ring-CH-type fingers," abbreviated MARCH1, and Excel interprets it as a date, converting it into March 1, 2020.

This situation amuses me. It’s a peculiar edge case that intrigues me. When the initial Excel developers created the feature to convert certain values to dates, who could have predicted it would disrupt scientific documentation? I also find comfort in knowing I’m not the only one grappling with Excel’s oddities. Yet, this gene formatting issue is more than just an amusing glitch; it poses a significant challenge. A study from four years ago revealed that around 20% of articles in major genomics journals with supplemental Excel gene lists contained incorrect gene name conversions. In fact, scientists have been discussing Excel's complications since 2004. This quirky anomaly has been causing chaos in genomics publications for two decades.

Recently, the HUGO Gene Nomenclature Committee (HGNC) opted to rename these troublesome genes to prevent them from being misinterpreted as dates in Excel. MARCH1 was changed to MARCHF1, SEPT1 became SEPTIN1, and so forth. In other words, geneticists became so frustrated with Excel's interference that they modified the official gene names to be more compatible with the software.

There’s a Kafkaesque quality to this situation. The profound intersects awkwardly with the mundane: significant scientific research meets Excel’s formatting limitations. It’s strange to see individual struggles mirrored on a larger scale, especially since genetics as a field appears to face the same issues I do as an individual.

Making a Multiple Sequence Alignment in MEGA provides insight into the nuances of gene data handling and the implications of software choices in genetic research.

Chapter 2: Responses to the Excel Gene Naming Crisis

After the initial amusement online, I noticed three primary reactions to this issue. The first is the “learn to use Excel correctly” perspective. This view posits that there’s nothing inherently wrong with Excel; rather, scientists should use it properly. If they wish to retain their data format, they can prefix values with an apostrophe or set the column type to text. According to this argument, the data mishaps reflect poorly on the scientific community’s computer skills.

The second viewpoint is the idea that scientists should not even be using Excel. It’s deemed too simplistic for their needs, and they should employ more advanced tools like Matlab or R, which would alleviate these problems.

Finally, there are the critics of Microsoft, who argue that the issue stems from Microsoft’s software design. They contend that Excel should not only address the specific 27 genes that coincide with dates but also refrain from altering any data formats altogether. This perspective paints Excel as a widespread nuisance, leading to calls for a collaborative effort to persuade Microsoft to modify its software.

I empathize with all these viewpoints, yet the reality likely lies somewhere in between. The HGNC’s decision to rename these genes arose from a broader context where both scientists and their assistants have varying levels of computer expertise. While many researchers know how to preserve their data formats, errors still occur. CSV files may become corrupted when reopened in Excel, and less experienced researchers might forget these precautions. “It’s incredibly frustrating,” one geneticist shared with The Verge. The formatting issues have broken the researchers’ resolve.

For Microsoft, this is an unusual edge case. The 27 genes fortuitously match terms that could be interpreted as dates. To be fair to Microsoft, the names of the months predate these genes (in fact, when Excel was developed, these genes had not yet been named). Perhaps there could have been a scenario where this issue gained traction and Microsoft updated Excel to prevent these specific names from being misinterpreted as dates. However, such changes are complex and would take years to implement across educational institutions as they renew their software agreements. More likely, if Microsoft was made aware of this issue, they would simply direct users to the relevant knowledge base article.

Ultimately, geneticists found themselves at a crossroads: adapt to the world or insist on changing it. They chose to adapt.

3 Ways to Convert Ensembl IDs to Gene Symbols provides a practical guide for geneticists navigating software limitations in their research.

Chapter 3: The Broader Implications of Software Limitations

This entire situation has captivated me, as it symbolizes a larger issue: our collective powerlessness against technology. I ponder how software—often limited and challenging to navigate—has infiltrated every sector, making it nearly impossible to escape its reach. Every desk and home now hosts computers, as do shops and offices. Every action and thought is mediated by software. Just as sailors navigate according to tidal charts, researchers must work around software limitations.

I’ve taken to downloading gene data spreadsheets—meaningless to me—just to explore and identify formatting errors. It’s akin to a game of "Where’s Waldo" but for incorrectly formatted genes. While I indulge in this philosophical musing, I recognize the industry must continue progressing. The HGNC made a sensible and pragmatic decision to address what was essentially an unfortunate, albeit amusing, naming conflict.

There’s an amusing epilogue to this tale. While sifting through gene names, I’ve come across some bizarre entries. One gene is humorously named “Sonic Hedgehog,” inspired by both a video game character and the band Sonic Youth. Others include “Bag of Marbles,” Cheap Date, Buttonhead, and Dunce. Although these names may seem amusing, they can pose challenges for doctors who must convey serious health concerns to parents while explaining a mutation in a gene called "One-Eyed Pinhead."

Before learning about these names, I had assumed gene designations were carefully chosen by scientists, making it seem outrageous that they would need to be altered for the trivial reason of Excel formatting. Now, it’s hard to view scientists as the logical, research-driven individuals we often imagine them to be. The outrage expressed online regarding the Excel debacle has largely been on behalf of geneticists, who, for the most part, appear relieved by the changes. This raises a more human question about our assumptions. The Excel story resonates because of the reverence we hold for science and our belief that scientists are rational individuals, rather than people who enjoy a good laugh and occasionally assign whimsical names to genes. Ultimately, they, like all of us, are simply striving to navigate the limitations of the software available to them.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

A Comprehensive Look at the 2022 MacBook Air M2

This review explores the 2022 MacBook Air M2, its design, performance, and overall value.

Unlocking a Better You: Five Keys to Personal Growth

Discover five essential strategies for personal growth and happiness amidst life's challenges.

Title: Embracing Rest Amidst a Hustle Culture

Discover the journey of balancing ambition and self-care while navigating burnout and the importance of prioritizing rest.