How to Search on SensoGal

Searches for translation examples in the SensoGal Corpus can be carried out by word, lemma or concept while simultaneously searching in a single language or in several languages.

The result of the search contains the translation units (i.e. the equivalent sentences) in which the queried term or concept appears. These equivalent sentences are displayed in the results both in their original written form and in the analyzed version with grammatical and semantic annotations. In the layout of the results, the downward arrow symbol [↓] is used to display the annotations of all the words in the analyzed version, while the upward arrow symbol [↑] hides the annotations again. Initially, the only words with visible annotations are those specifically highlighted by the search. The annotations can also be viewed individually by clicking on the words in blue.

The results can be viewed horizontally (by default) or vertically. You can switch between these two display modes using the controls at the top right of the results column ([] or []).

Above each translation unit in the results is its sorting number in the list, a text code linked to its bibliographic reference and the index number of the translation unit within the text from the corpus. Beside those descriptors is the option to listen to the audio file that corresponds to the English translation unit if the search result comes from the LITTERA Audio-Textual Corpus of English-Spanish literary texts.

By selecting the "Wider Context" option from the search menu, results can also include the previous and subsequent sentences surrounding the resulting sentences of the query.

You can limit the number of translation units included in the results by selecting "20", "50", "100" or "All" in the "Number of Results" option from the search menu to make it easier to interpret the query results. Please note that, to avoid technical problems, the online application will never return more than 500 query results.

Finally, for the examination of the LITTERA Audio-Textual Corpus, the set of texts explored can be limited by specifying the variety of spoken English (American or British) used in the audiobooks via the "Spoken English Variety" option in the search menu.

Search by word

The "search by word" option searches the corpus texts for the exact spelling of what is written in the query cell. To search by word, the "Search Lemma" option must be unchecked in the query form.

If you search for a word in only one language, the application will return the sentences containing that word in the texts of the corpus in that language, accompanied by the corresponding sentences in the texts in the other language or languages of the translation. In this case, the queried word will appear highlighted in yellow in the sentences of the language for which the query was made. Likewise, the words in the translations with a semantic tag that coincides with the word found in the searched language will also appear in yellow. This double highlighting indicates the location of the searched word within the text and its likely interlinguistic lexical equivalents in the other languages of the translation.

If the search is made by looking for words in several languages at once, the application will return the equivalent phrases of the corpus containing those words in the corresponding languages of the search. In this case, the words searched will be highlighted in yellow in the phrases in their respective languages. Therefore, in this type of query the highlighting serves only to draw attention to the location in the text of the words searched.

Whether you search for a word in a single language or in several languages at once, you can restrict the scope of the query by specifying the part of speech --name, verb, adjective or adverb-- of the words searched.

Search by lemma

The "search by lemma" option searches the corpus texts for lexical forms annotated with the lemma indicated in the query cell. To search by lemma, the option "Search Lemma" in the query form must be checked.

If you search for a lemma in only one language, the application will return the sentences containing that lemma in the texts of the corpus in that language, accompanied by the corresponding sentences in the texts in the other language or languages of the translation. In this case, the queried lemma will appear highlighted in yellow in the sentences of the language for which the query was made. Likewise, the lemmas in the sentences of the translations with a semantic tag that coincides with the lemma found in the searched language will also appear in yellow. This double highlighting indicates the location of the searched lemma within the text and its likely interlinguistic lexical equivalents in the other languages.

The results from searching for a lemma in a single language include a lexicographic headline, in the format of the Galnet Dictionary, with the multilingual equivalents of the lemma in the language combination of the corpus explored.

If the search is made by looking for lemmas in several languages at once, the application will return the equivalent sentences from the corpus containing those lemmas in the corresponding languages of the search. In this case, the queried lemmas will be highlighted in yellow in the sentences of their respective languages. Therefore, in this type of query the highlighting serves only to draw attention to the location of the queried lemmas in the text.

Whether you search for a lemma in a single language or in several languages at once, you can restrict the scope of the query by specifying the part of speech --noun, verb, adjective or adverb-- of the queried words.

In order to simplify the search for lemmas and words, the system displays a drop-down list of the available lemmas with predictive text during the elaboration of the query.

Search by concept

The "search by concept" option looks for the lexical forms in the corpus annotated with the ILI (the WordNet 3.0 InterLinguistic Index) indicated in the query cell. When searching by concept, the system ignores what is indicated in the query cells for lemma or word, as well as the "Search Lemma" option in the query form.

The ILI to search for a concept in the SensoGal Corpus has to be formulated in the format that appears, for example, in the entries for concepts in Galnet, the Galician WordNet. To search for a concept in Galnet, you must search in the "Search variant" cell with the query form for a term that expresses that concept in Galician, Portuguese, Catalan, Basque, Spanish, English, German, Latin, French, Italian or Chinese, indicating the language for which you are searching. If Galnet contains any concept for the search term, the system will give access to an entry with the semantic information associated with it. The ILI will appear at the top of the entry, with a format similar to "ili-30-06743506-n", where "ili-30-" is a fixed string of characters, instead of "06743506" there can be any other eight-digit number, and the final "n" can be "n", "v", "a" or "r", depending on whether the concept corresponds to a noun, verb, adjective or adverb.

For example, the ILI used in this explanation (ili-30-06743506-n) is the ILI of the concept that can be glossed as "solution to a problem", which is expressed in Galician with the terms resposta, solución and resultado; in English with the terms answer, solution and result; in Basque with emaitza and erantzun; and in Chinese with 答案 and , as can be seen in the entry for this concept in Galnet. Looking for this ILI through the SensoGal interface in the original English texts of the English-Galician SEMCOR Corpus, for example, we can see the different expressions of the concept in English and its different translations in Galician in the corpus texts.

If you search by concept in only one language, the application will return the sentences containing words with the ILI specified in the corpus texts in that language, accompanied by the corresponding sentences in the texts in the other language or languages of the translation. In this case, the ILI will be highlighted in yellow in the sentences of the language for which the query was made. Similarly, the ILI of the translation that matches the desired ILI will appear in yellow. This double highlighting indicates the location of the searched concept within the text and its likely interlinguistic lexical equivalences in the other languages of the translation.

The results of searching by concept in a single language include a lexicographic header, in the format of the Galnet Dictionary, with the multilingual equivalents of the concept in the language combination of the corpus in question.

If concepts are searched in several languages at once, the application will return the equivalent sentences of the corpus containing the ILI specified in the corresponding languages of the search. In this case, the queried ILIs will be highlighted in yellow in the sentences in their respective languages. Therefore, in this type of query the highlighting serves only to draw attention to the location in the text of the concept’s terms in each language.

Semantic correspondence filter

The list of results of a query in the SensoGal Corpus is preceded by the number of translation units that meet the search conditions and the number of translation units with no correspondence at the semantic level. In the context of this query application, a translation unit is understood to have a semantic correspondence when it is possible to identify the ILI of the searched term in the equivalent sentence of the translation.

The funnel symbol [], which appears at the top right of the results column, allows you to filter out results where an ILI cannot be identified as matching the queried word, lemma or concept in the target languages. This is a mechanism designed to maximize the number of correct semantic analyses of the queried term in the display of the results, based on the hypothesis that the possibility of success in the automatic semantic analysis will be greater when the analysis identifies the same concept (i.e. the same ILI) in the original and in the translation.

The SensoGal Corpus query interface allows you to toggle between displaying and not displaying the translation units unmatched at the semantic level using the controls at the top right of the results column ([] or []).