As our translation and localization needs evolve, our CATs evolve too. The creators of Computer-Assisted Translation tools (CAT tools) keep releasing cool new updates, and we're exploring one such update in recent weeks. It's always a nice turbocharge to learn and experiment with new ways to better organize terminology and content, if it means better ways to prevent avoidable detail errors or inconsistent use of words.
Lately, we've been experimenting with a new feature that slices up and analyzes large quantities of text (English text, in this case). Based on its findings, it guesses which words and phrases might be appropriate as glossary terms.
This would sound like an approximate use of semantic analysis, a process that allows users to discover trends or prioritize issues within text – the machine looks for structural patterns and relationships among words and phrases in a large document. Our software uses mathematical algorithms to produce a sortable list of words and phrases that the machine believes are important. The list also includes number of occurrences, a number to describe the term's likely relevance, and sentences that contain each term (to show context). Then, a human can go in and evaluate the system's suggestions, weeding out the silly ideas. For more details about how this works, here's a piece published in the Journal of Artificial Intelligence Research.
This tool will help tremendously when we build glossaries for clients with specialized terminology... collections of terms we'd like to keep consistent into future projects. It's possible to build a list manually while working on a project, picking out phrases that are specialized industry terms, but for long-term clients with multiple projects, it helps to prepare a trove of glossary terms ahead of time. A computer can slice up and sort the document into phrases before we even approach it.
One creator of semantic analysis software, Janya, Inc., described semantic analysis as the extraction of "critical information from unstructured and semi-structured data to create actionable intelligence." (read more here) A company called Expert System has a similar development called Cogito, used to analyze news about terrorism (Lower on the page is an explanation of the data visualization).
You may have heard of SA's cousin, Latent Semantic Indexing (LSI) – same concept but different purpose, because LSI is designed to improve data retrieval. The system analyzes a large collection of data to produce more intelligent search results or to group documents according to "conceptual similarity".
Both semantic analysis and semantic indexing have dozens of uses related to search engines, programming languages, plagiarism, Biblical texts, and even customer service (Intesa Sanpaolo, a financial services organization in Europe, is using this technology to look for different sentiments in customer satisfaction. Some places transcribe customer calls and identify keywords) Such technology might make our jobs disappear someday, but for the while, smart information design is giving us new uses and applications for the same knowledge. We're applying it to translation and software localization and it's doing us pretty good service.