sgidi

PORTAL SGIDI

Sistema de Gestão da Investigação, Desenvolvimento e Inovação

Ultima atualização: 5 outubro 2025

Portal NI&D

DOCUMENTOS

REGISTO DE IDEIAS E PROJETOS

VIGILÂNCIA TECNOLÓGICA E ECONÓMICA, GESTÃO DO CONHECIMENTO

DOCUMENTOS

Repositório: Google Drive: pasta NI&D/SGIDI (link)
Repositório Github: link
Gravações das reuniões NID (Zoom): aqui

Norma Portuguesa NP 4457 2021 (pdf)
Norma Portuguesa NP 19011 (2019) (pdf)

MG 01_00: Manual SGIDI (pdf) (versão controlada: pdf)
MG 02_00: Manual de funções (doc)

RG 07_01: Plano de auditoria interna (junho 2023, pdf)
RG 08_01: Relatório da auditoria interna (junho 2023, doc) (auditor: cv0, cv1, cv2)
RG 07:02: Plano de auditoria externa, fase 1 (julho 2023, pdf)
RG 08_02: Relatório de auditoria externa, fase I (julho 2023, pdf)
RG 07_03: Plano de auditoria externa, fase 2 (setembro 2023, pdf)
RG 08_03: Relatório da auditoria (fase II) de concessão (setembro 2023, pdf); Certificados (pdf): pt, en, es
RG 08_04: Relatório da auditoria de acompanhamento (setembro 2024, pdf)
RG 07_04: Plano de auditoria interna (dezembro 2024, pdf)
RG 08_05: Relatório da auditoria interna (dezembro 2024, pdf) (auditor: cv0, cv1, cv2)
RG 08_06: Relatório da auditoria de acompanhamento (setembro 2025, pdf)

Mapas de processos:
- PR 01_00: Controlo da informação documentada (doc)
- PR 02_00: Controlo da conceção (doc)
- PR 03_00: Gestão de ideias e iniciativas de inovação (doc)
- PR 04_00: Gestão das interfaces (doc)
- PR 05_00: Recursos humanos (doc)
- PR 06_00: Auditorias interna (doc)
- PR 07_00: Não conformidades, ações corretivas e de melhoria (doc)

Registos:
- RG 01_00 Matriz de uso e controlo de documentos (xls)
- RG 04_01 Matriz de de ideias e projetos (xls)
- RG 05_00 Fichas de ideias e projetos (aqui) e respetivos cronogramas (aqui, botão)
- RG 06_00 Ocorrências e planos corretivos (doc) (xls)
- RG 07_00 Plano de auditoria (doc)
- RG 08_00 Relatório de auditoria (doc)
- RG 09_01: Matriz de riscos e oportunidades (doc)

Planos:
- PL 01_02: Plano de formação 2025 (doc)
- PL 01_01: Plano de formação 2024 (doc)
- PL 01_00: Plano de formação 2023 (doc)
  - - Ação de formação: Gestão da Investigação, Desenvolvimento e Inovação - NP 4457:2021 (Iberogestão, 11 abril 2023) (pdf). Certificados (pdf).
    - Ação de formação: Auditorias a sistemas de gestão ISO 19011:2018 (Iberogestão, 10 julho 2023) (pdf). Certificados (pdf).
- PL 02_00: Orçamento IDI (xls)

Atas de acompanhamento e revisão pela gestão:

Ata #12, 21 de julho de 2025, revisão pela gestão (doc)
Ata #11, 21 de dezembro de 2024, revisão pela gestão (doc)
Ata #10, 2 de julho de 2024, revisão pela gestão (doc)
Ata #9c, 31 de março de 2024 (doc)
Ata #9, 22 de dezembro de 2023, revisão pela gestão (doc)
Ata #8b, 31 de outubro de 2023 (doc)
Ata #8, 1 de Agosto de 2023 (doc)
Ata #7, 1 de Julho de 2023 (doc)
Ata #6, 23 de junho de 2023, revisão pela gestão (doc)
Ata #5, 1 de Junho de 2023 (doc)
Ata #4, 1 de Maio de 2023 (doc)
Ata #3, 1 de Abril de 2023 (doc)
Ata #2, 1 de março de 2023 (doc)
Ata #1, 1 fevereiro de 2023 (doc)

Documentos preparatórios:

Registos online: workflow e sistema documental (doc)
Contexto externo e fontes para vigilância tecnológica (doc)
[v11 (doc) ]

REGISTO DE IDEIAS E PROJETOS

RG 03 e RG05

RG04 MATRIZ DE IDEIAS E PROJETOS

RG04 CRONOGRAMAS

ID25. 20 de janeiro de 2025, EB
ia.arq: consorcio e plataforma, ID25, PJ13
Organizar um consorcio de empresas e entidades académicas para apresentar uma candidatura aos fundos (SIID empresarial em copromoção) para desenvolver e testar uma plataforma na cloud para arquivos documentais e jornais com uma nova arquitetura IA e recorrendo a agentes de IA para funcionalidades avançadas de procura e investigação.

ID24. 4 de novembro de 2024, AA
Serviço GCV OCR na cloud, II24, PJ12
Uma primeira versao de um aplicativo local para digitalização de documentos usando os serviços avançados da Google Vision via a respetiva API foi desenvolvido no ambito do projeto PJ1 (jornais). Uma comparação recente dos resultados obtidos com o de outras facilidades na cloud mostra o interesse da solução. Pretende-se desenvolver agora a ideia na forma de uma app na cloud e facilmente integrada com serviços como Googler Drive (ou equivalentes).

ID23. 3 de setembro de 2024, EB PROJETO CONCLUIDO
Workflows "low cost" de arquivo e exploração de arquivos documentais históricos, II23,PJ11
As facilidades de arquivo de documentos digitalizados e as facilidades de procura em serviços como o Google Drive e IA abrem novas perpetivas para a investigação em dominios como a historiografia ou em coleções de história natural. Ferramentas adicionais faceis de integrar no ambiente do Google Drive facilitam ainda mais o trabalho de recolha de informação e a transcrição de textos.

ID22. 5 de fevereiro de 2024, EB
Desenvolver programa de digitalização à distancia de fichas de um conjunto distribuido de coleções de história natural, II22
Desenvolvimento de processos low cost e de instalação e operação fácil e rápida para digitalização à distância e visualização online de conteúdos documentais relativos à história natural, em especial fichas de coleções de história natural e fichas bibliográficas (manuscritas ou datilografadas). Instalar uma plataforma online na cloud com facilidades de visualização e procura. Fazer candidatura e organizar consorcio / plano.

ID21. 20 de janeiro de 2024, EB
Geração de dados em formato gbif em fichas de herbário
Desenvolver app com facilidades de geração de quadros em Excel e etiquetas em formato padrão gbif a partir das fichas de herbário transcritas pela plataforma (dados e metadados)

https://www.gbif.org/standards

ID20. 15 de dezembro de 2023, EB
Explorar os limites de OCR/HRT de textos manuscritos, II20
Caso da escrita de A.O. Salazar, um caso limite de dificuldade. Usar o material dos Diários (imagens do Arquivo Salazar e transcrições publicadas em livro) para aplicar metodologias de ML (machine learning)
A caligrafia de Salazar é um caso extremo de caligrafia "dificil", mas uma funcionalidade de transcrição por AI tornaria muito mais facil o acesso ao imenso material do Arquivo Salazar cuja decifração é problemática.

11 de março de 2025, JCA

OCR de caligrafias pessoais

Uma ideia para resolver a questão do Salazar (e outras fichas manuscritas):

os modelos da Google tem uma janela de contexto grande: 1 milhão de tokens e muito baratos
escolher umas páginas do Salazar, já transcritas manualmente por alguém correctamente, que sejam uma amostra diversa de palavras
converter para base64 as imagens e juntar as transcrições correctas
enviar uma imagem que se quer transcrever, junto com este prompt antes:
“tendo em conta estas 10 imagens, em base 64, e a transcrição correcta a seguir a cada imagem, transcreve esta nova imagem que enviamos tb em base 64 e tendo em conta o texto das outras imagens, aprende a reconhecer também o texto desta imagem que enviamos.”

Janelas de contexto maiores abrem oportunidades desde género, dispensando fine-tuning. O desafio aqui é escolher em qualidade e quantidade suficiente amostras suficientes que sejam o mais representativas suficientes do ‘handwriting’ de alguém.
Com diminuição de custos de APIs este é o melhor caminho de investigação a seguir.

ID19. 15 de dezembro de 2023, EB
Integrar facilidades de reconstrução de textos de noticias de jornais via ChatGPT na plataforma de visualização e procura de jornais antigos, II19,PJ9 PROJETO CONCLUIDO
Expandir os resultados obtidos em PJ6 para integrar de forma interativa as facilidades de reconstrução de textos selecionados em páginas de jornais usando os recursos de chatGPT

ID18. 3 de dezembro de 2023, EB
Usar Trankribus para layouting de documentos em jornais com layout não em colunas regulares, II18, PJ8 PROJETO CONCLUIDO
Geração de hOCRs de páginas de jornal usando o txt obtido pelo Google Vision com json/XML obtidos pelas funções de layouting da aplicação Transcribus e criar dois casos de demonstração

Usar as funcionalidades do Trankribus para gerar XML files de layouts de páginas de formato arbitrario através das opções de "block segmentation + M1 model" da versão local da app.

https://www.transkribus.org

31 de Março de 2023, EB

Modelos fundacionais e tradução

O GPT & Ca. é considerado "uma maravilha de multilinguismo" (ultimo The Economist) e uma ferramenta muito capaz de tradução, especialmente em linguas "com mais recursos". Parece ser consensual que o futuro da tradução será hibrido com "pós processamento editorial" humano e que a tradução trivial será feita por AI (LLM combinado com outros modelos MT). Ler os jornais antigos é, ao fim e ao cabo, um problema de tradução: passar do texto com erros para o texto sem erros e eventualmente convertido para a linguagem (estilo) atual. As experiencias de "tradução" que fizemos (reescrita de partes de textos de noticias originais) parecem confirmar isto.
Logo uma app que use um modelo LLM refinado para o contexto dos jornais (o universo linguistico de base) deverá permitir consultar / traduzir o texto dos jornais e responder a perguntas (prompts) dentro desse universo. Isto significa passar por cima da questão do reconhecimento correto dos carateres gráficos mesmo correndo o risco de alguns erros. Parece semelhante a traduzir crioulo ou mirandes para portugues corrente ou vice versa. Esta abordagem à leitura e "search" nos jornais ultrapassarria as dificuldades do reconhecimento de carateres e permite "salvar" scannings já feitos e com eventual com pior qualidade (caso dos nossos jornais).
O q compreendi do VertexAI e do GenAppBuilder da Google é que permitem fazer isto com facilidade e rapidez e são o "frontend" de abertura da tecnologia pela Google para inserção de facilidades AI nas apps de processos correntes. O "tuning" (ou "fine tuning") do Vertex parece-me ser isso, um processo de ML machine learning automatico que compreendo corresponder a restringir o universo de aplicação do LLM ao contexto proporcionado pelos dados (neste caso, os textos txt das colunas dos jornais). Presumo que "tokens" que não aparecem nos dados não devem ser usadas (seja o que for q se entenda por token neste caso) pela app: parece corresponder a gerar um modelo LLM simplificado para os dados submetidos. Este treino do modelo parece ser feito automáticamente pelo Vertex AI com a possibilidade de escolha de diferentes modelos fundacionais (o "model garden").

- Potencial do Vertex AI como "tradutor" de ocr com erros (jornais antigos)
- Comentários: aqui, aqui

ID17. 30 de Março de 2023, JCA
Modelos fundacionais de IA aplicados a um universo especifico de conteudos

sugiro que se faça uma experiência inicial mais simples. Pegar num livro, não traduzido, algo original. Criar algo em que se pode processar esse documento PDF e depois usar GPT para efectuar questões sobre o mesmo. Se resultar, ver como se pode escalar para jornais.
Ver este vídeo de youtube, 11 minutos, python notebooks e alguns serviços externos (tem o custo da base de dados vectoriais, produzir embedings na OpenAI e depois cada query/questão): https://www.youtube.com/watch?v=h0DHDp1FbmQ

ID16. 10 de março de 2023, EB

Experiencias VR/MR em museologia digital com hololens

Potencialidades da VR/MR com holow lens (Microsoft) para museologia digital
aqui e aqui

ID15, 1 de março de 2023, EB. II15, PJ7 PROJETO CONCLUIDO

Listar os produtos da loja online Inovatec Press no Google Merchant Center

ID14. 29 de Fevereiro de 2023, EB

Expansão da Inovatec Press para os mercados de lingua portuguesa

Integrar produtos da Inovatec Press em plataformas brasileiras com print on demand e plataformas nacionais com distribuição digital para os paises de lingua portuguesa

ID13. 28 de Fevereiro de 2023, JB; II13

Potencial de tecnologias holow lens para performances e instalações Datagrama

Oportunidades para novas experiências de realidade virtual/realidades mistas em instalações artisticas Datagrama usando a tecnologia "holow lens" da Microsoft
https://www.microsoft.com/en-us/hololens

ID12. 28 de Fevereiro de 2023, JB

Potencial de tecnologias audio 3D para performances e instalações Datagrama

Potencial das tecnologias audio ambisonics 3D em performances e instalações artisticas Datagrama (camada adicional nos modelos Datagrama de VR/MR: speakers com geometrias espaciais de alinhamento, complementaridade sensorial)
https://youtu.be/51dUnndYwo4

ID11. 13 de Fevereiro de 2023, JB; II11

memTSI:UP Memória das tecnologias de informação na Universidade do Porto

Uma década depois, a experiencia adquirida com o projeto memTSI pode ser interessante para uma nova iniciativa desse tipo, mas agora restrita ao universo da Universidade do Porto e ao papel da UP na inovação tecnológica neste domínio ao longo de um período histórico de três décadas e do seu impacto territorial e nacional. Muitos dos primeiros atores desta história, desde os anos 60, estão ainda vivos, pelo que uma iniciativa deste tipo parece oportuna e segue as tendências internacionais para preservar a memória das tecnologias da informação com base em história oral (entrevistas) dos protagonistas

ID10. 18 de Janeiro de 2023, JCA; II10, PJ6 PROJETO CONCLUIDO

Reconstrução de textos antigos por OpenGPT

Potencial do OpenGPT conseguir "reconstruir" o texto de uma noticia dos "jornais" em portugues antigo e com erros de ocr para portugues corrente.
Ver teste aqui

ID9. 20 de novembro de 2022, NB; II9

Emidio Biel e a linha do Tua

As fotografias de Emilio Biel sobre a linha do Tua continuam pouco conhecidas. Uma exposição dessas fotos com um programa associado de sessões sobre E. Biel e sobre a a história da linha do Tua pode constituir uma oportunidade com a Universidade do Porto (Galeria da Biodiversidade) e para um "road show" pelos cinco concelhos do vale do Tua (em colaboração com a Agencia Regional do Vale do Tua) e assim difundir os trabalhos desenvolvidos no projeto FOZTUA sobre o assunto (Cf. livro "A linha do Tua, 1887, e as fotografias de E. Biel"

ID8. 15 de novembro de 2022, NB; II8

Segurança de bens em festivais (Matilha)

Os participantes dos festivais de musica, arte, etc têm habitualmente o problema da segurança (carteira, documentos, computador, etc.) assim como dificuldade de acesso a redes wifi de internet. Propõe-se explorar a ideia de um camião com cacifos/cofres de segurança onde um participante possa guardar os seus bens usando tecnologias digitais para chaves de acesso (via telemovel). O camião (trailer apenas, sem cabine) deverá ter segurança 24 horas. O espaço envolvente deve ter condições para proporcionar wifi de qualidade aos utilizadores, assim como facilidades de refrescos e café (espaço de relax).

ID7. 31 de julho de 2022, EB; II7

Digitalização de manuscritos e fichas da Sociedade de Geografia de Lisboa

A coleção de manuscritos da SGL justifica a sua digitalização urgente. A vasta biblioteca não está disponivel online para consulta. Os testes feitos em visita recente sugerem a viabilidade da iniciativa e o interesse institucional. As plataformas online desenvolvidas até aqui podem ser fácilmente aplicadas neste caso

ID6. 15 de julho de 2022, EB; II6, PJ5 PROJETO CONCLUIDO

Reorganizar e melhorar o site inovatec.pt

Reorganizar e expandir o site da empresa para ser mais facil a aplicação de técnicas SEO, incluindo landing pages para segmentos especificos de procura. Incluir versão em lingua inglesa das paginas.

ID5. 1 de junho de 2022, EB; II5, PJ4 PROJETO CONCLUIDO

Digitalização de fichas de coleções de história natural

Muitas coleções de história natural estão catalogadas em fichas manuais (geralmente manuscritas, por vezes datilografadas) cuja digitação se tem mostrado inviável por falta de recursos. A experiência com as fichas bibliográficas do Ateneu Comercial do Porto sugere a sua extensão às fichas de coleções do Museu de História Natural e Ciência da Universidade do Porto e a extensão da plataforma na cloud desenvolvida para as fichas bibliográficas do Ateneu para este tipo de universo.

ID4. 13 de março de 2022, EB; II4, PJ3 PROJETO CONCLUIDO

Palaforma (cloud) para consulta e acesso a jornais diários

Tendo sido criada uma aplicação local capaz de procura e acesso a jornais diários (Diário Popular e Jornal do Comércio, 1883-1893) propõe-se a sua reformulação em termos on-line (na cloud) para acesso através de browser na internet.

ID3. 10 de novembro de 2021, EB; II3, PJ2 PROJETO CONCLUIDO; INSTALAÇÃO

Digitalização de fichas bibliográficas manuscritas

O catálogo da biblioteca do Ateneu Comercial do Porto (mais de 50 mil livros) em fichas manuscritas pode ser um bom teste para o uso dos scanners de nova geração (CZUR para documentos planos e Epson ES para fotografias / fichas). Uma extensão da plataforma cloud desenvolvida para jornais (séc. XIX) poderá ser adaptada para permitir a procura e acesso às fichas bibliográficas sem necessidade da sua digitação manual.

ID2. 11 de março de 2021, EB; II2, PJ1 PROJETO CONCLUIDO

JornaisXIX: Digitalização de jornais diários (século XIX)

Os jornais do século XIX têm problemas de digitalização devido à qualidade e dimensão da impressão. Temos coleções completas de dez anos dois jornais diários (Diário Popular e Jornal do Comércio, 1883-1893) digitalizados pela Biblioteca Nacional (projeto FOZTUA) que podem ser um bom testbed para desenvolver competencias de OCR desses documentos e criar competencias internas nessas tecnologias (ocr, layouting, etc.).

ID1. 1 de dezembro de 2018, EB; II1

memDÃO: Memória da linha do Dão

Ao longo de quase um século as linhas do Dão (entre Santa Comba Dão e Viseu) e a linha do Vouga (entre Espinho e Viseu) foram importantes vias de transporte de passageiros e carga, que ajudaram ao desenvolvimento económico e social da cidade e contribuíram para a atual configuração da região. Muita da experiência humana de região foi influenciada pelas linhas do Dão e do Vouga, sendo importante recolher, preservar e valorizar a memória dessas vivências. Com base na experiência do projeto FOZTUA propõe-se um projeto para produzir materiais sobre a história e património ferroviário da região que valorizem infraestruturas regionais (ecopista do Dão, por exemplo).

VIGILÂNCIA TECNOLÓGICA E ECONÓMICA,
GESTÃO DO CONHECIMENTO

Sistema de alertas

1 de outubro de 2025, EB

https://mail.google.com/mail/u/0/#inbox/FMfcgzQcpwwLnCmMxwkzMCtSjHhgDNNt

Ex-OpenAI CTO Makes a $12B Bet on Fine-Tuning at Scale

What’s happening: Ex-OpenAI CTO Mira Murati’s Thinking Machines just unveiled Tinker, a cloud service that industrializes fine-tuning. Instead of forcing teams to spin up full retrains, Tinker applies LoRA adapters—tiny parameter layers—so developers can spin out task-specific models without duplicating a 70B-weight beast. Multiple teams can share the same backbone, error recovery is automated, and a “Cookbook” of pre-built workflows cuts setup from weeks to hours.

How this hits reality: If this scales, it flips the power dynamic. OpenAI and Anthropic still monetize access to frontier models, but Tinker makes open-source stacks commercially viable for anyone with a credit card. Fine-tuning at scale means startups, labs, even enterprises can deploy “good-enough” custom models without paying per-token taxes to closed platforms. Giants that rely on API lock-in suddenly look exposed.

Key takeaway: If Tinker works, the moat isn’t model size, it’s who commoditizes customization.

Tinker is a flexible API for efficiently fine-tuning open source models with LoRA. It's designed for researchers and developers who want flexibility and full control of their data and algorithms without worrying about infrastructure management.

LoRA is an efficient approach to fine-tuning that trains a streamlined adapter instead of updating all base model weights. Our research demonstrates that with the right setup, LoRA matches the learning performance of full fine-tuning while providing more flexibility and requiring less compute.

https://thinkingmachines.ai/tinker/

https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

26 de setembro de 2025, EB

https://www.towardsdeeplearning.com/leann-the-worlds-smallest-vector-index-that-could-redefine-rag-75d5aa3708c8

LEANN: The World’s Smallest Vector Index That Could Redefine RAG

LEANN: The smallest vector index in the world. 97% storage savings, no accuracy loss. RAG just got lighter

Here’s the problem:

Storing embeddings for even modest datasets quickly balloons into hundreds of GBs.

Running a personal vectorDB often requires a heavy cloud setup.

For hobbyists, researchers, and small teams, this feels overkill.

Enter LEANN, a Python library that flips the script.

LEANN calls itself the “smallest vector index in the world”, boasting 97% storage savings with no accuracy loss. Instead of storing every embedding, it uses a graph-based selective recomputation strategy. In simple terms:

Embeddings are computed on-demand rather than stored for all documents.

You don’t need to hold hundreds of GBs just to index your files.

📊 Example:

Indexing 60 million text chunks normally takes 201 GB.

With LEANN, it only takes 6 GB.

Who Should Care About LEANN?

Researchers & Students → finally run large-scale search experiments without massive GPUs or storage.

Developers & Hackers → build private, local-first AI assistants that actually fit on a laptop.

Startups → save thousands on infrastructure while keeping data secure.

Power Users → turn your personal digital life into a private, searchable AI memory.

https://github.com/yichuan-w/LEANN?tab=readme-ov-file&source=post_page-----75d5aa3708c8---------------------------------------

25 de setembro de 2025, EB

https://medium.com/neo4j/why-im-replacing-my-rag-based-chatbot-with-agents-1718ec7e384c

“The rumors of RAG’s demise have been greatly exaggerated."

That was the response from Alexy Khrabrov, Neo4j’s eyes and ears on the ground in San Francisco, when I asked him whether I should be worried about our GenAI curriculum on GraphAcademy.

That said, public opinion has turned on retrieval-augmented generation (RAG), and I can’t move for thought pieces on how RAG is dead. I see this as mostly hyperbole, but I do think the position of RAG within a GenAI stack is changing.

25 de setembro de 2025, EB

https://alain-airom.medium.com/the-future-of-document-scanning-a-look-at-llm-powered-ocr-b7d92a0556f9

The Future of Document Scanning: A Look at LLM-Powered OCR

In summary, this process demonstrates a powerful and pragmatic approach to intelligent document processing. I leveraged local LLMs via Ollama, and I was able to combine the raw transcription power of OCR with the contextual intelligence of vision models. My testing with Granite and Llama showed that while different models have distinct strengths — speed versus descriptive detail — both can provide high-quality, multilingual results. Ultimately, this approach allows for the creation of customized, RAG-enabled systems that can transform raw data into a truly valuable, searchable, and conversational knowledge base for any industrial application.

24 de setembro de 2025, EB

https://medium.com/coding-nexus/googles-embeddinggemma-a-lightweight-open-source-embedding-model-you-can-run-anywhere-b218d4b4ac9d

Google’s EmbeddingGemma: A Lightweight, Open-Source Embedding Model You Can Run Anywhere

EmbeddingGemma takes a different path. It’s a small, open-source embedding model (308M parameters) that runs right on your laptop, desktop, or even your phone. No servers. No internet. Just you and the model.

https://ai.google.dev/gemma/docs/embeddinggemma/inference-embeddinggemma-with-sentence-transformers#:~:text=EmbeddingGemma%20is%20a%20lightweight%2C%20open,with%20no%20internet%20connection%20required.

https://developers.googleblog.com/ja/introducing-embeddinggemma/#:~:text=Gemma-,Introducing%20EmbeddingGemma:%20The%20Best%2Din%2DClass%20Open,Model%20for%20On%2DDevice%20Embeddings&text=We're%20excited%20to%20introduce,models%20nearly%20twice%20its%20size.

EmbeddingGemma, a new open embedding model that delivers best-in-class performance for its size. Designed specifically for on-device AI, its highly efficient 308 million parameter design enables you to build applications using techniques such as Retrieval Augmented Generation (RAG) and semantic search that run directly on your hardware. It delivers private, high-quality embeddings that work anywhere, even without an internet connection.

Built to flexibly work offline: Small, fast, and efficient, it offers customizable output dimensions (from 768 to 128 via Matryoshka representation) and a 2K token context window to run on everyday devices like mobile phones, laptops, desktops, and more. Designed to work with Gemma 3n, together they unlock new use cases for mobile RAG pipelines, semantic search, and more.

Integrated with popular tools: To make it easy to get started with EmbeddingGemma, it already works with your favorite tools, such as sentence-transformers, llama.cpp, MLX, Ollama, LiteRT, transformers.js, LMStudio, Weaviate, Cloudflare, LlamaIndex, LangChain and more!

14 de setembro de 2025, EB

https://medium.com/@Micheal-Lanham/gpt-5-mini-just-changed-the-rag-game-forever-e50944297799

How OpenAI’s latest compact powerhouse is revolutionizing retrieval-augmented generation and why developers are already switching

Want to build lightning-fast AI agents that can search, analyze, and synthesize information like a research assistant on steroids? You’re in the right place.

Picture this: You’re building an AI application that needs to search through thousands of documents, pull relevant information, and generate accurate answers in real-time. Until recently, you had to choose between speed and intelligence. Fast models gave you quick but shallow responses. Smart models made you wait… and wait… and wait.

That trade-off just ended.

12 de setembro de 2025, EB

https://medium.com/data-science-in-your-pocket/google-gemma3-270m-the-best-smallest-llm-for-everything-efcf927a74be

Google Gemma3 270M : The best Smallest LLM for everything

How to use Google Gemma3 270M for free?

This isn’t just a lightweight model, it’s a tool built to do real work, efficiently, without draining your device or your wallet.

But don’t let the number fool you, Gemma 3 270M isn’t built to win size contests. It’s built to follow instructions, run on tiny devices, and get fine-tuned for laser-focused tasks.

Gemma 3 270M works on this same logic, it’s perfect when you need one thing done really well. Things like:

Turning messy text into structured data

Classifying emails or support tickets

Extracting entities from legal documents

Filtering toxic content in a multilingual app

Powering simple creative tools like a bedtime story generator

Why waste compute running a multi-billion-parameter model on that?

https://medium.com/coding-nexus/googles-new-llm-runs-on-just-0-5-gb-ram-here-s-how-to-fine-tune-it-locally-ab910fa39732

In a world chasing bloat, Gemma 3 270M is a return to minimalism, with brains. If you’ve got a focused task, don’t over-engineer it. Start small. Start with Gemma 3 270M.

https://blog.stackademic.com/i-built-a-fully-offline-ai-agent-that-answers-questions-from-pdf-images-and-audio-no-cloud-2e8d71b246d6

I trained my own local RAG system with OCR, Whisper, CLIP, and open-source language models — all running on a mid-tier laptop.

29 de agosto de 2025, EB

https://levelup.gitconnected.com/accurate-audio-transcription-is-almost-free-now-35438edd028b

Accurate Audio Transcription Is [Almost] Free Now.

Batch process hours of audio transcription in seconds.

I thought audio transcription was an expensive task.

Companies that do these make huge bucks. Eleven labs charge $22 a month, and Whisper costs about the same. Isn’t that expensive?

But what if we can do it for (literally) free?

Well, open-source transcription models have existed for a long time. But they weren’t preferred because of their lack of accuracy. But we now have a model that is both free and as accurate as leading closed models.

Well, the story doesn’t finish there.

This model needs only 2GB RAM to work. That (again, literally) means every device we use today has twice the required memory. Even my grandmom’s smartphone has twice the required memory.

It’s fast.

How could it not be? It’s a mere 600 million parameter model. On larger RAMs, this can be as fast as lightning.

Of course, I’m talking about NVIDIA’s new Parakeet 2

https://medium.com/data-science-in-your-pocket/nvidia-parakeet-v2-the-smallest-fastest-free-speech-recognition-asr-model-28ee2ccfac51

While the AI world’s been partying with flashy multimodal giants like Google Gemini 2.5 Pro, Qwen 3, LLaMA 4, and DeepSeek V3.1, NVIDIA’s rolled out something refreshingly compact — but equally powerful. Meet Parakeet-v2: a 600 million parameter Automatic Speech Recognition (ASR) model that’s free, open-source, and built for real-world audio tasks.
In a world obsessed with billion-parameter giants, Parakeet-v2 proves that smart design > sheer size. With its incredible accuracy, lightning-fast transcription, and plug-and-play usability, it’s one of the most practical ASR models released in recent memory. And since it’s fully open-source, there’s no excuse not to give it a spin

22 de agosto de 2025, EB

https://blog.gdeltproject.org/building-a-story-index-of-a-decade-of-tv-evening-news-with-geocoding-sentiment-frame-narrative-analysis-in-one-hour-for-154/

Building A Story Index Of A Decade Of TV Evening News With Geocoding, Sentiment, Frame & Narrative Analysis In 1 Hour For $154

https://blog.gdeltproject.org/

https://www.gdeltproject.org/

13 de agosto de 2025, EB
Dots.ocr : The best small-sized OCR ever

https://medium.com/data-science-in-your-pocket/dots-ocr-the-best-small-sized-ocr-ever-b069d92153c2

Where other models uses YOLO-style detectors with a language model, dots.ocr uses just one VLM to handle layout detection, text parsing, reading order, and even formulas. No switching between models. No feature misalignment. Just a clean, prompt-based interface to switch between tasks.

Deployment Is Surprisingly Clean

You can deploy it via vLLM or Huggingface APIs. The docs are actually usable, but here’s what caught my eye:

No TensorRT drama. No need to baby-sit CUDA.

Prompt-based task switching means you don’t need a custom inference script for every document type.

Docker support if you’re lazy (I was).

No picture parsing. That’s still a gap. If your docs embed infographics, you’re out of luck.

Throughput’s limited for bulk jobs. It’s not yet optimized for high-scale PDF ingestion.

30 de julho de 2025, EB

https://mail.google.com/mail/u/0/#inbox/FMfcgzQbgcKklJwGzLkPfRnTXCbGwrxH

But by answering more questions directly—whether through featured snippets, knowledge panels, or now AI summaries—Google transitions from being a mere index to an oracle.

Advertisers, desperate for access to the dwindling pool of users who actually leave Google’s domain, find themselves bidding up the price. For Google, this would be a masterstroke—extracting greater rents from advertisers while providing users with what appears to be a more efficient search and knowledge acquisition experience.

Their post-click behavior becomes a crucial determinant of value for advertisers

This dynamic has analogues in the past history of digital advertising: consider the shift from indiscriminate banner ads to targeted, intent-driven search ads in the early 2000s, which dramatically increased both the value-per-click and the efficacy of online marketing.

Google’s response: if anyone is going to disrupt our business by doing this, it is going to be us.

We will thus cannibalize its own legacy business and simultaneously define the next paradigm—one in which the search page itself becomes the user’s destination. One thinks of IBM’s embrace of the PC, or Apple’s willingness to let the iPhone kill the iPod

Imagine a world in which one confronts not a list of blue links, but a cogent synthesis, drawing on JSTOR, Wikipedia, and the latest Substack polemic, all in one go. Frictionless knowledge, delivered at the speed of thought, with the AI acting as both librarian and tutor.

But the losers in this brave new world are the websites.

The web risks becoming a substrate for AI extraction, rather than a living network of destinations. And then we get a new form of SEO, dedicated to maximizing AI-slop.

But there is a price: obscured nuance, flattened debate, and consensus prioritized over dissent. Moreover, there is the loss of “productive friction”—the serendipity and discovery that comes from wandering through primary sources.

Of course, this strategy risks eroding the web’s ecosystem of content creators, fragmenting user journeys, and triggering regulatory backlash.

the long-term consequences for information quality, diversity, and innovation remain uncertain.

Could publishers respond by designing landing pages specifically for “AI-primed” visitors—pages that acknowledge the summary, offer deeper context, or present unique value not captured in the initial answer

Might we see a bifurcation of the web: one layer optimized for human navigation and engagement, another for AI ingestion and synthesis?

How can users maintain agency, critical thinking, and serendipitous discovery in a world where the MAMLM’s output is tailored, persuasive, and frictionless?

13 de julho de 2025, EB

https://medium.com/@pankaj_pandey/a5afc9aa564e

Unlike traditional LLMs that exclusively process text, VLMs are designed to understand and interact with both visual data (like images and videos) and textual data simultaneously. At their core, these models integrate a vision encoder, which extracts meaningful features and representations from visual inputs (e.g., recognizing objects, textures, and spatial relationships), with a language model, which excels at understanding and generating human-like text. These two components work in conjunction, often through sophisticated alignment and fusion mechanisms, to map visual and textual information into a shared embedding space. This allows VLMs to perform a variety of complex tasks, such as generating descriptive captions for images, answering questions about visual content, and even enabling visual search. By unifying perception and expression, VLMs enable AI systems to interpret and communicate about the world in a more holistic and intuitive manner, much closer to how humans perceive information.

https://blog.gopenai.com/small-model-big-impact-ibm-granite-vision-dominates-document-understanding-547a505a0874

10 de julho de 2025, EB

https://arxiv.org/html/2407.16833v1

While LC demonstrate superior performance in long-context understanding, RAG remains a viable option due to its lower cost and advantages when the input considerably exceeds the model’s context window size. Our proposed method, which dynamically routes queries based on model self-reflection, effectively combines the strengths of both RAG and LC, achieving comparable performance to LC at a significantly reduced cost. We believe our findings contribute valuable insights for the practical application of long-context LLMs and pave the way for future research in optimizing RAG techniques.

https://www.jellyfishtechnologies.com/building-rag-agent-with-google-adk-and-vertex-ai/

23 de junho de 2025, EB

https://mail.google.com/mail/u/0/#inbox/FMfcgzQbfpLLXhWLrFNQJGQGljxKPCpn

Outset.ai is a groundbreaking startup that has revolutionized customer research by utilizing artificial intelligence to automate qualitative research processes. Founded by Aaron Cannon and Michael Hess in late 2022, the platform enables enterprises to perform interviews at speeds and costs unattainable through traditional methods, offering insights that enhance competitive advantage. With a successful prototype developed in just one weekend and subsequent acceptance into Y Combinator, Outset.ai has rapidly grown to serve major clients like Nestlé and Microsoft.

The company’s innovative approach eliminates the need for human moderators, allowing for the completion of significantly more interviews in a fraction of the time and expense. By addressing the cold-start problem through a design-partner program and embedding its branding throughout the user experience, Outset.ai has built credibility and awareness in the market. Its focus on metrics, compliance, and multilingual support further enhances its appeal, allowing for expansion into regulated industries and global markets.

Having raised a total of $21 million, Outset.ai has demonstrated impressive growth, achieving substantial cost reductions for clients while completing projects much faster than traditional methods. With over a million minutes of AI-moderated interviews conducted and a lean operational model, Outset.ai is poised to solidify its leadership in AI-driven customer research, showcasing the potential of innovative solutions to transform industry standards.

Key Takeaway: Outset.ai exemplifies a remarkable business opportunity in the realm of customer insights, illustrating how AI can create significant efficiencies and cost savings for enterprises, while opening new markets and redefining qualitative research methodologies.

23 de junho de 2025, EB

https://machine-learning-made-simple.medium.com/fine-tuning-llms-is-a-huge-waste-of-time-bd0b98fcc282

Fine-tuning isn’t knowledge injection — it’s knowledge overwrite. For advanced LLMs, neurons are no longer neutral placeholders; they’re highly specialized, densely interconnected repositories of valuable information. Carelessly updating them risks catastrophic, invisible damage.
If your goal is to build adaptable, scalable, and robust systems, treat fine-tuning with the caution it deserves. Embrace modular solutions (software principles don’t dissapear just b/c we’re working on AI) that maintain the integrity of your network’s foundational knowledge. Otherwise, you’re simply dismantling your carefully constructed knowledge ecosystem — one neuron at a time.

19 de junho de 2025, EB

Remember when everyone was obsessing over Retrieval-Augmented Generation (RAG)? Building vector databases, chunking documents, creating embeddings, and dealing with complex pipelines just to answer questions about your company docs? Well, I may have some news that might make you rethink everything.

The goal of this article is to explore the current state of Retrieval-Augmented Generation (RAG) in 2025 and dive into the ongoing debate around whether RAG is “dead.” We’ll do this by building a simple alternative to RAG and comparing it with the traditional approach. As a bonus, we will also discuss the topic of “Thinking vs Non-Thinking Models”.

To explore this idea, we’ll build a simple Q&A agent in just 30 minutes that completely skips RAG and goes straight to search APIs (Tavily)+ large context windows . And guess what? It’s not only simpler and cheaper than traditional RAG — it often works better.

https://itnext.io/you-dont-need-rag-build-a-q-a-agent-in-30-minutes-and-without-a-thinking-model-52545408f495

This project showcases a simpler, more practical alternative to traditional RAG systems - demonstrating how modern search APIs combined with large context windows can eliminate the complexity of Retrieval-Augmented Generation for many documentation Q&A use cases.

As we enter 2025, there's growing evidence that search-first approaches are becoming more cost-effective and simpler than traditional RAG. With models like Gemini 2.5 Flash offering 5M token context windows at competitive prices, many developers are discovering: "Why build complex RAG pipelines when you can just search and load relevant content into context?"
https://github.com/javiramos1/qagent

12 de junho de 2025, EB

Aprender com os prompts "leaked" dos outros:

https://simonwillison.net/2025/May/25/claude-4-system-prompt/
https://github.com/asgeirtj/system_prompts_leaks/blob/main/OpenAI/app/o3.md
https://github.com/jujumilk3/leaked-system-prompts

2 de junho de 2025, EB

Open-source LLMs aren’t free — they’re deferred-cost systems disguised as freedom. You save on licenses, and pay in engineering time, architectural rigidity, and operational complexity.
The download is free. The deployment is a commitment.
Know what you’re signing up for — before it signs you up for a multi-million dollar maintenance contract with your own team.

https://machine-learning-made-simple.medium.com/the-costly-open-source-llm-lie-f83fdc5d5701

2 de junho de 2025, EB

Pinpoint Updates: Faster audio transcription and new AI features
You can now transcribe audio up to 20 times faster! We've also added automatic language detection, so you can breeze through your audio files without needing to specify the language beforehand. This enhancement will significantly speed up your workflow.
The team is also introducing time-saving generative AI features, available to early access users:

Effortless Labeling: You can now automatically label up to 100 documents at a time by simply setting your criteria for Pinpoint to follow.
Streamlined Data Extraction: Need specific information from multiple documents? You can now extract data from multiple documents by telling Pinpoint what you need. The results will be delivered in an easy-to-use spreadsheet, complete with links back to the original text.
Quick Collection Summaries: Get the gist of your entire collection or specific documents in a flash with our new summarization feature.
Easy Document Comparison: Quickly identify key differences and similarities by comparing up to three documents within a collection side-by-side.

https://mail.google.com/mail/u/0/#inbox/FMfcgzQbfVBKJrfwWGTBLlNtCqMBsbbb

2 de junho de 2025, EB

Streamlining Document Conversion: The Modern Document Processing Stack for LLM-Powered Applications

The stack leverages a set of specialized open-source libraries that each address specific aspects of document processing:

Format Conversion: Convert PDFs, DOCX and spreadsheets into Markdown.

Visual Understanding: Use vision-based models (e.g., Zerox) to handle image-heavy or complex layouts.

Web Scraping: Employ tools like the Jina AI Reader to extract clean text from web pages, even when dynamic content is present.

Metadata Extraction: Automatically tag documents with language, token count and other metadata to improve later processing.

Service-Oriented Architecture: Expose functionality over HTTP using

https://medium.com/@pankaj_pandey/90e3374b95ee

2 de junho de 2025, EB

Microsoft’s hybrid AI architecture represents a monumental shift in PC computing, fostering an innovative ecosystem that encourages the development of AI-driven applications. While challenges remain, the potential for a more intelligent and adaptive operating system is significant

Microsoft’s new Windows architecture dynamically distributes AI tasks between local neural processing units (NPUs) and the cloud, optimizing performance, enhancing user privacy, and improving energy efficiency.

The Windows AI Foundry serves as an ecosystem for developers to create, manage, and deploy AI models. It features over 1,900 optimized AI models, end-to-end workflow tools, and strong security compliance measures, making AI development more accessible and efficient.

Microsoft envisions Windows as a platform for autonomous agents capable of performing complex tasks independently, enhancing user productivity while safeguarding data privacy. The new standards facilitate communication between these agents, enabling them to collaborate in achieving user goals.

https://aisecret.us/duolingos-ai-torn-apart/?ref=ai-secret-newsletter

2 de maio de 2025, EB

I Tried Every PDF Parser For My Chat App — Only One Worked

My ultimate goal is to create an open-source tool that anyone can deploy without API key requirements or significant costs.I’m also exploring running my own PDF parsing microservice, likely with vanilla Gemini for parsing,

Client-side parsing has potential: Though quality suffers, client-side parsing scales remarkably well for free apps since processing happens on users’ machines. With more effort, this approach could be optimized.

Creating a full-stack PDF chat application on a budget turned out to be surprisingly challenging. Here’s everything I tried that DOESN’T work, and the thing that finally did.

https://medium.com/data-science-collective/i-tried-every-pdf-parser-for-my-chat-app-only-one-worked-e20613835d27

https://github.com/oidlabs-com/Lexoid?tab=readme-ov-file

https://oidlabs-com.github.io/Lexoid/

https://oidlabs-lexoid.hf.space/?__theme=system

4 de abril de 2025, EB

Biologia de um LLM:
we don’t understand how models do most of the things they do.
Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example:

Claude can speak dozens of languages. What language, if any, is it using "in its head"?
Claude writes text one word at a time. Is it only focusing on predicting the next word or does it ever plan ahead?
Claude can write out its reasoning step-by-step. Does this explanation represent the actual steps it took to get to an answer, or is it sometimes fabricating a plausible argument for a foregone conclusion?

https://www.anthropic.com/research/tracing-thoughts-language-model?utm_source=substack&utm_medium=email

https://transformer-circuits.pub/2025/attribution-graphs/biology.html

https://mail.google.com/mail/u/0/#inbox/FMfcgzQZTzXtxNHzCzDljKhxmdwSCtdJ?compose=CllgCJTNqGGftdXqNtbnqfCJXvCSnmwLpnmSKNpSbgbkTcszwwDvxPZWlzrjbxLfhNjWhNtTWWg

4 de abril de 2025, EB

in Section 1 by looking at supply chain bottlenecks and estimates that the globally available AI-relevant compute will grow by a factor of 10x by December 2027 (2.25x per year) relative to March 2025 to 100M H100e.

We expect usage to concentrate in the hands of leading AGI companies (e.g., OpenAI, Anthropic, xAI) and the AGI-focused development efforts within larger tech companies (e.g., Google, Meta), the largest two or three of which will have a 15-20% share of the globally available AI compute by the end of 2027 (around 15-20M H100e) up from a 5-10% share today (around 500k H100e). we expect a leading AI company to be able to deploy about 1M copies of superintelligent AIs at 50x human thinking speed (500 words per second), using 6% of their compute resources, mostly with specialized inference chips.

https://ai-2027.com/research/compute-forecast?utm_source=substack&utm_medium=email

11 de março de 2025, JCA

Este tweet é interessante:

https://x.com/WesternMishima/status/1897726036470907315

Mistral OCR não foi bom para manuscritos (ex: escrita do Salazar), mas GPT 4.5 (lançado há umas semanas) funcionou bem.

7 de março de 2025, JCA

Um novo modelo, e o novo nº 1 em benchmarks, para OCR: https://mistral.ai/fr/news/mistral-ocr

Do laboratório francês equivalente à OpenAI: 1.000 páginas por $1 USD ou 2.000 páginas por $1 USD, se forem feitos pedidos em batch (opção de os resultados não serem imediatos e são retornados num par de horas, serem corridos em alturas que o hardware está livre.

Converter 1 milhão de páginas para Markdown, fica por €930 ou €465, bons preços.

O link tem bastantes exemplos, incluindo figuras e tabelas.

28 de fevereiro de 2025, EB

https://research.ibm.com/blog/docling-generative-AI

Docling, IBM’s new open-source toolkit, is designed to more easily unearth that information for generative AI applications. The toolkit streamlines the process of turning unstructured documents into JSON and Markdown files that are easy for large language models (LLMs) and other foundation models to digest. Once machine-readable, this data can be used to train and customize AI models and AI agents for enterprise tasks and to ground them on the latest, most accurate facts through retrieval-augmented generation (RAG).

Docling features a command-line interface, a Python API, and is small enough to run on a standard laptop. It takes just five lines of code to set up, and it integrates seamlessly with open-source LLM frameworks like LlamaIndex and LangChain for RAG and question-answering applications. Its permissive, open-source MIT license allows developers to collaborate and expand the project to meet their needs. Docling sidesteps OCR when it can, in favor of computer vision models trained to recognize and categorize the visual elements on a page.

26 de fevereiro de 2025, EB

When comparing eScriptorium and Transkribus, it's essential to understand that they are both powerful tools for Handwritten Text Recognition (HTR), but they have some key differences:

**Key Differences:**

* **Open Source vs. Proprietary:**

* eScriptorium is open-source, providing users with greater flexibility and freedom. This means it's free to use and modify.

* Transkribus, while offering some free features, operates on a subscription-based model for full access.

* **Underlying Technology:**

* eScriptorium utilizes Kraken, an open-source OCR engine. This provides users with fine-grained control over model training and customization.

* Transkribus primarily uses PyLaia as its HTR engine, alongside some newer "Super Models".

* **Model Customization:**

* eScriptorium is known for its high degree of customizability, allowing users to build and fine-tune models to suit their specific needs.

* Transkribus also enables model training, but eScriptorium is known for having a high level of flexability concerning the training of ones own models.

* **Data Handling:**

* One of the points of difference is the way that the applications manage and export data, including the formatting of exported XML files. There have been tools created to help users migrate their data from transkribus to escrtiptorium.

https://www.reddit.com/r/Automate/comments/10gc3mi/i_built_an_aipowered_web_scraper_that_can/

https://www.reddit.com/r/computervision/comments/1et5fex/i_made_a_model_to_isolate_news_articles_in/https://app.roboflow.com/projects-qzwxj/newspaper-article-classification/2

https://escriptorium.inria.fr/

https://kadoa.co

https://escripta.hypotheses.org/about

https://gitlab.inria.fr/scripta/escriptorium

16 de fevereiro de 2025, JCA

Um benchmark e análise do novo modelo da Google, Gemini Flash 2, para extrair conteúdo de PDFs. $1 USD por ~6.000 páginas e 0.84 de precisão, é um preço e resultado excelente.

https://www.sergey.fyi/articles/gemini-flash-2

5 de fevereiro de 2025, EB

Based on this analysis, the Google Gemini model is:

10x cheaper than OpenAI’s o3-mini and 7x cheaper than DeepSeek’s R1

Many orders of magnitude faster

Has a much higher context window

Is extremely capable, arguably more so than O3-mini in a subset of complex SQL tasks

This model is undeniable proof that we’re moving to cheaper, more capable large language models. It decisively outclassed its competitors proving that the era of expensive, resource-intensive models is over. This breakthrough marks a transformative shift in the industry, paving the way for more accessible, high-performance AI solutions in real-world applications.

https://medium.com/tech-and-me/grammarly-vs-chatgpt-for-proofreading-ac1d7e5008f3

15 de janeiro de 2025, EB

In contrast, China follows a state-driven, diverse AI development plan. Like the United States, China also invests in LLMs but simultaneously pursues alternate paths to GAI, including those more explicitly brain-inspired. This report draws on public statements by China’s top scientists, their associated research, and on PRC government announcements to document China’s multifaceted approach.

The Chinese government also sponsors research to infuse “values” into AI intended to guide autonomous learning, provide AI safety, and ensure that China’s advanced AI reflects the needs of the people and the state. This report concludes by recommending U.S. government support for alternative general AI programs and for closer scrutiny of China’s AI research.

New approaches such as training not based on human feedback but using different value models might be suitable for specific, well-defined scenarios, but it is unclear how such a strategy could generalize to broader sets of ethical dilemmas, let alone lead to LLM responses that are consistent with a specific set of values. It is therefore currently far from clear if and how particular sets of values can be “trained into” an LLM, given that, for LLMs, “good” and “bad” are just words to be predicted, with no grounding in any kind of valuative framework.

A final distinction between Chinese and western AI research evidenced throughout this paper needs to be made explicit, namely, the real possibility that China’s directed, strategic approach may be more effective—all other things being equal—than the western profit-driven approach that focuses on quick wins at the possible expense of more successful strategies that require a longer time horizon

https://cset.georgetown.edu/publication/chinese-critiques-of-large-language-models/?utm_source=Center+for+Security+and+Emerging+Technology&utm_campaign=3f3e8aec6f-Chinese+Critiques+of+Large+Language+Models&utm_medium=email&utm_term=0_fcbacf8c3e-3f3e8aec6f-438422560

31 de janeiro de 2025, EB

Our research suggests that knowledge distillation from reasoning models presents a promising direction for post-training optimization. While our current work focuses on distilling data from mathematics and coding domains, this approach shows potential for broader applications across various task domains. The effectiveness demonstrated in these specific areas indicates that long-CoT distillation could be valuable for enhancing model performance in other cognitive tasks requiring complex reasoning. Further exploration of this approach across different domains remains an important direction for future research.

DeepSeek report: https://arxiv.org/pdf/2412.19437#figure.caption.3

3 de setembro de 2024, EB

I never liked writing with Grammarly correcting me in the moment because that interferes with my flow. I didn’t like pasting my text in after finishing because it annoyed me to look through the text for errors, and it especially annoyed me when Grammarly wanted to change something for no good reason at all.
When ChatGPT arrived, I switched to using it and have been much happier with the results. Here are the instructions I gave ChatGPT before pasting in my text.

https://medium.com/tech-and-me/grammarly-vs-chatgpt-for-proofreading-ac1d7e5008f3

21 de maio de 2024, EB

From The New York Times:
One of the weirder, more unnerving things about today’s leading artificial intelligence systems is that nobody — not even the people who build them — really knows how the systems work.

That’s because large language models, the type of A.I. systems that power ChatGPT and other popular chatbots, are not programmed line by line by human engineers, as conventional computer programs are.

https://www.nytimes.com/2024/05/21/technology/ai-language-models-anthropic.html

14 de maio de 2024, EB

From The New York Times:
OpenAI Unveils New ChatGPT That Listens, Looks and Talks
Chatbots, image generators and voice assistants are gradually merging into a single technology with a conversational voice.

https://www.nytimes.com/2024/05/13/technology/openai-chatgpt-app.html?smid=em-share

23 de abril de 2024, EB

From The New York Times:

Microsoft Makes a New Push Into Smaller A.I. Systems
The company that has invested billions in generative A.I. pioneers like OpenAI says giant systems aren’t necessarily what everyone needs.

https://www.nytimes.com/2024/04/23/technology/microsoft-ai.html?smid=em-share

12 de março de 2024, EB

From The New York Times:

Now, a start-up founded by three former OpenAI researchers is using the technology development methods behind chatbots to build A.I. technology that can navigate the physical world.

Covariant, a robotics company headquartered in Emeryville, Calif., is creating ways for robots to pick up, move and sort items as they are shuttled through warehouses and distribution centers. Its goal is to help robots gain an understanding of what is going on around them and decide what they should do next.

The technology also gives robots a broad understanding of the English language, letting people chat with them as if they were chatting with ChatGPT.

The technology, still under development, is not perfect. But it is a clear sign that the artificial intelligence systems that drive online chatbots and image generators will also power machines in warehouses, on roadways and in homes.

https://www.nytimes.com/2024/03/11/technology/ai-robots-technology.html

24 de janeiro de 2024, EB

A Inteligência Artificial deu o próximo passo: já consegue copiar a sua caligrafia

https://zap.aeiou.pt/a-inteligencia-artificial-deu-o-proximo-passo-ja-consegue-copiar-a-sua-caligrafia-578747

14 de dezembro de 2023, EB

ChatGPT to summarize Politico and Business Insider articles in ‘first of its kind’ deal

OpenAI to pay German media group Axel Springer to use its material, including stories behind paywalls

https://www.theguardian.com/technology/2023/dec/13/openai-axel-springer-chatgpt-story-writing-business-insider-politico

19 de outubro de 2023, EB

From The New York Times:

An Industry Insider Drives an Open Alternative to Big Tech’s A.I.

The nonprofit Allen Institute for AI, led by a respected computer scientist who sold his company to Apple, is trying to democratize cutting-edge research.

https://www.nytimes.com/2023/10/19/technology/allen-institute-open-source-ai.html?smid=em-share

19 de outubro de 2023, EB

Digital artist Daniel Ambrosi has created an exhibition that interprets quintessentially English, eighteenth-century vistas with AI

https://www.theguardian.com/technology/gallery/2023/oct/09/ai-and-the-landscapes-of-capability-brown-in-pictures?CMP=share_btn_link

8 de outubro de 2023, EB

Projeto Traprinq (com Transcribus): transcrever os procesos da inquisição portuguesa (1536-1821)

https://traprinq.mozellosite.com

Projeto CLARA-HD (com Transcribus): Diário de Madrid (1788-1825):

- - https://clara-nlp.uned.es

12 de setembro de 2023, EB

From FT:
Generative AI exists because of the transformer. This is how it: Writes, Works, Learns, Thinks and Hallucinates

https://ig.ft.com/generative-ai/

13 de Agosto de 2023, EB
‘Only AI made it possible’: scientists hail breakthrough in tracking British wildlife
Technology proves able to identify dozens of species in thousands of hours of recordings.

https://www.theguardian.com/technology/2023/aug/13/only-ai-made-it-possible-scientists-hail-breakthrough-in-tracking-british-wildlife?CMP=share_btn_link

8 de Agosto de 2023, EB
AI can identify passwords by sound of keys being pressed, study suggests

Researchers create system using sound recordings that can work out what is being typed with more than 90% accuracy

https://www.theguardian.com/technology/2023/aug/08/ai-could-identify-passwords-by-sound-of-keys-being-pressed-study-suggests?CMP=share_btn_link

2 de Agosto de 2023, EB

From The New York Times:

A.I.’s Inroads in Publishing Touch Off Fear, and Creativity

The technology has the potential to affect nearly every aspect of how books are produced — even the act of writing itself.

https://www.nytimes.com/2023/08/02/books/ais-inroads-in-publishing-touch-off-fear-and-creativity.html?smid=em-share

12 de junho de 2023, JCA

Novidades de Machine Learning da Apple na WWDC, incluido o seu software com UI, que não necessita de código:

https://developer.apple.com/machine-learning/create-ml/

Versão 2 de um produto de geração de vídeo via prompts, dos mesmos investigadores que criaram o Stable Diffusion:

https://research.runwayml.com/gen2

6 de junho de 2023, EB

Vision Pro: Novo headset da Apple para realidade virtual:

15 de maio de 2023, JCA

Um bom guia, extraído da experiência interna de uma empresa, sobre Large Language Models e conselhos sobre ‘prompts’:

https://github.com/brexhq/prompt-engineering

11 de maio de 2023, JCA

Uma equipa, que saiu da OpenAI há uns anos, anunciou um modelo que suporta até 100k tokens (cerca de 75k palavras), muito além dos 4k disponíveis actualmente no GPT3 da OpenAI. Talvez seja útil experimentar com jornais inteiros, com as colunas por ordem, assim os textos ficam com o contexto inteiro, ajudando na correcção.

Está aqui uma notícia: https://www.anthropic.com/index/100k-context-windows
Encontram preços neste PDF: https://cdn2.assets-servd.host/anthropic-website/production/images/apr-pricing-tokens.pdf

9 de maio de 2023, AA

Sobre coleções de antropologia colonial

https://www.publico.pt/2023/05/07/culturaipsilon/noticia/esqueleto-humano-nao-objecto-museu-2048630

24 de abril de 2023, AA

Sobre a geração de prompts nos modelos AI: "In this blog post, I will make the argument that prompt engineering is a real skill that can be developed based on real experimental methodologies. I will use a realistic example to walk through the process of prompt engineering a solution to a problem that provides practical value to an application."

https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting

21 de abril de 2023, EB

Como funcionam os LLM e o GPT: magnifico (mas longo) ensaio de Stephen Wolfram: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work (pdf).
Ver também no The Economist: "Large, creative AI models will transform lives and labour markets": https://www.economist.com/interactive/science-and-technology/2023/04/22/large-creative-ai-models-will-transform-how-we-live-and-work (pdf)
Ligação OpenGPT com Wolfram: "ChatGPT + Wolfram is something very new—really a completely new kind of technology.": https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers (pdf)
Sobre implicações dos LLM e apps generativas, ver o excelente ensaio no The Economist: "How AI could change computing, culture and the course of history", https://www.economist.com/essay/2023/04/20/how-ai-could-change-computing-culture-and-the-course-of-history (pdf) (keep)

14 de abril de 2023, JCA

Anuncio de novo serviço da Amazon: AWS BEDROCK, serviços AI de modelos fundacionais para objetivos especializados

- https://aws.amazon.com/bedrock

29 de Março de 2023, EB

- "What has actually been achieved on this video call? It takes Jared Spataro just a few clicks to find out. Microsoft’s head of productivity software pulls up a sidebar in Teams, a video-conferencing service. There is a 30-second pause while somewhere in one of the firm’s vast data centres an artificial-intelligence (ai) model analyses a recording of the virtual meeting so far. Then an impressively accurate summary of your correspondent’s questions and Mr Spataro’s answers appears. Mr Spataro can barely contain his enthusiasm. “This is not your daddy’s ai,” he beams."
- https://www.economist.com/business/2023/03/26/big-tech-and-the-pursuit-of-ai-dominance?itm_source=parsely-api
- Potencial do GPT para resumir entrevistas e reuniões.
- Surpreendente potencial multilingua do GPT

3 de Março de 2023, JCA

- Novas API para OpenGPT e Whisper (https://openai.com/blog/introducing-chatgpt-and-whisper-apis)
- Customização do modelo GPT para datasets especificos (https://openai.com/blog/customizing-gpt-3) .
- Ver ELICIT, The AI research assistant (https://elicit.org)

3 de Março de 2023, JCA

- Novas API para OpenGPT e Whisper (https://openai.com/blog/introducing-chatgpt-and-whisper-apis)
- Customização do modelo GPT para datasets especificos (https://openai.com/blog/customizing-gpt-3) .
- Ver ELICIT, The AI research assistant (https://elicit.org)

2 de Março de 2023, EB

- Arquivo Municipal de V N Gaia (https://arquivo.cm-gaia.pt) : "O Comércio do Porto" digitalizado
- Empresa de prestação de serviços: gestão de sistemas de arquivos (http://gisa.paradigmaxis.pt)

27 de Janeiro de 2023, AA

Tecnologias OCR. digitalização de jornais

- https://ocr-d.de/en (OCR Development, DFG project)
- https://www.newseye.eu (NewsEye. A Digital Investigator for Historical Newspapers. EU project)

21 de setembro de 2022, JCA

"Speech recognition", identificação de palavras em ficheiros de audio, exemplos de aplicação Whisper da OpenAI (https://openai.com/research/whisper )

30 de julho de 2022, EB

Reconstrução histórica das notícias sobre “Carrazeda de Ansiães” (um concelho do vale do Tua, o “interior” do interior do norte de Portugal) a partir das ocorrências nos dois periódicos entre 1883 e 1893, parte de uma investigação em curso sobre a história da região nos finais do século XIX (a ortografia dos textos originais foi corrigida com a ajuda do corretor do Google Docs)