livellosegreto.it is one of the many independent Mastodon servers you can use to participate in the fediverse.
Livello Segreto è il social etico che ha rispetto di te e del tuo tempo.

Administered by:

Server stats:

1.2K
active users

#corpus

0 posts0 participants0 posts today

→ Une #intelligence #artificielle libre est-elle possible ?
linuxfr.org/news/une-intellige

« Posons-nous un instant la question : qu’est-ce que le #code #source d’un réseau de #neurones ? […] La #GPL fournit une définition : le code source est la forme de l’œuvre privilégiée pour effectuer des #modifications. Dans cette acception, le code source d’un réseau de neurones serait l’#algorithme d’entraînement, le réseau de neurones de départ et le #corpus sur lequel le réseau a été entraîné »

linuxfr.orgUne intelligence artificielle libre est-elle possible ? - LinuxFr.orgL’actualité du logiciel libre et des sujets voisins (DIY, Open Hardware, Open Data, les Communs, etc.), sur un site francophone contributif géré par une équipe bénévole par et pour des libristes enthousiastes

🚨 New preprint 🚨
"Does corpus size influence normalised frequencies?"

doi.org/10.31219/osf.io/tr8de

It may sound like a silly question, but many #corpus linguistic measures are influenced by corpus size. So we asked ourselves: Does this also hold for normalised #frequencies, a measure that is meant to correct raw frequencies for the size of the underlying corpus?

We approached this by checking the association between lists of normalised frequencies for samples of different sizes.

Os cuento un poco de intrahistoria del #Corpus Lingüístico del #Eroski en #catalán, #gallego, #euskera y #castellano con los textos de la #RevistaConsumer. Gracias a @fracofavor por compartir el Corpus en este toot:

mastodon.gal/@fracofavor/11188

Cito a mi compañero Iker Merchán, que entonces era el director de la web Consumer.es:

«Un día apareció alguien contando que había sacado a lo bruto los contenidos de la web y había hecho el corpus. Llevaba más de un año con el proyecto...»

+

mastodon.galhélio (-e) 🐗 (@fracofavor@mastodon.gal)Curiosidade sobre Eroski: na súa web levan anos traducindo os seus artigos ao euskera, ao galego e ao catalán e nalgún momento alguén tivo a idea de facer un corpus con todos eses textos, así que agora Eroski ten O ÚNICO CORPUS PARALELO DAS CATRO LINGUAS MÁIS FALADAS DO ESTADO https://corpus.consumer.es/corpus/aurkezpena
Replied in thread

@theklan part of of those frequency lists are based on the bible. For each of the 1001 #languages #Unilex scrapped various *open* online resources: wikipedias, bibles translations, wordpress blogs. It is used in all our android phones for text autocomplete. Sometimes, only the bible was available so the frequency list reflects it. I would prefer larger, more diverse raw data but they did with what they got. I do not know any better hyperlingual open #corpus. cc @alexture
github.com/unicode-org/unilex

GitHubGitHub - unicode-org/unilex: Lexical data at UnicodeLexical data at Unicode. Contribute to unicode-org/unilex development by creating an account on GitHub.

New profile, new #introduction

I'm Ártemis, aka #queerterpreter. This is my academic/languagery account, or that's what I'm going for. I'm a #sociolinguist and a PhD Candidate researching non-binary Spanish. I'm also an ES<>EN #translator and #interpreter. My diss leans heavily on #corpus linguistics, specifically corpora scraped from the birdsite. The data is safe but I'm in mourning.

My more personal acct is at @queerterpreter@queer.party, feel free to follow both or just one (or none) :Schwerified: