* . *
  • Tech News
    Nikon’s Z5 II is the cheapest full-frame camera yet with internal RAW video

    Nikon’s Z5 II is the cheapest full-frame camera yet with internal RAW video

    The Morning After: Let’s talk Switch 2 pricing

    The Morning After: Let’s talk Switch 2 pricing

    Amazon’s ‘Buy for Me’ AI will purchase stuff from third-party websites

    Amazon’s ‘Buy for Me’ AI will purchase stuff from third-party websites

    Vibe coding at enterprise scale: AI tools now tackle the full development lifecycle

    Vibe coding at enterprise scale: AI tools now tackle the full development lifecycle

  • Reviews
  • Noteworthy
  • Science
  • Opinions
  • Applications
  • Blockchain
    Gain an edge with DTX’s groundbreaking Hybrid Blockchain: Presale now open for LINK and XRP Traders

    Gain an edge with DTX’s groundbreaking Hybrid Blockchain: Presale now open for LINK and XRP Traders

    Unraveling the Mystery: What Exactly is Blockchain Technology?

    Unraveling the Mystery: What Exactly is Blockchain Technology?

    Revolutionary Gasless Blockchain Gaming Partnership Between Atari Founder’s New Firm and Skale Labs

    Discover the Exciting Outcome of a Blockchain Experiment: Decentralized Learning Robots Swarm to Success

    Unleashing a Swarm of Decentralized Learning Robots: The Surprising Results of Blockchain Experiment

    Vishvasya: Revolutionizing Citizen-Centric Apps with National Blockchain Framework for Enhanced Security and Transparency

    Vishvasya: Revolutionizing Citizen-Centric Apps with National Blockchain Framework for Enhanced Security and Transparency

  • Applications
  • Culture
  • Deals
  • Events
  • How-to
  • Roundups
  • Startups
Wednesday, July 16, 2025
No Result
View All Result
Tech News, Magazine & Review WordPress Theme 2017
  • Contact Us
  • Legal
    • Privacy Policy
    • Terms of Use
    • DMCA
    • Cookie Privacy Policy
    • California Consumer Privacy Act (CCPA)
  • Tech News
    Nikon’s Z5 II is the cheapest full-frame camera yet with internal RAW video

    Nikon’s Z5 II is the cheapest full-frame camera yet with internal RAW video

    The Morning After: Let’s talk Switch 2 pricing

    The Morning After: Let’s talk Switch 2 pricing

    Amazon’s ‘Buy for Me’ AI will purchase stuff from third-party websites

    Amazon’s ‘Buy for Me’ AI will purchase stuff from third-party websites

    Vibe coding at enterprise scale: AI tools now tackle the full development lifecycle

    Vibe coding at enterprise scale: AI tools now tackle the full development lifecycle

  • Reviews
  • Noteworthy
  • Science
  • Opinions
  • Applications
  • Blockchain
    Gain an edge with DTX’s groundbreaking Hybrid Blockchain: Presale now open for LINK and XRP Traders

    Gain an edge with DTX’s groundbreaking Hybrid Blockchain: Presale now open for LINK and XRP Traders

    Unraveling the Mystery: What Exactly is Blockchain Technology?

    Unraveling the Mystery: What Exactly is Blockchain Technology?

    Revolutionary Gasless Blockchain Gaming Partnership Between Atari Founder’s New Firm and Skale Labs

    Discover the Exciting Outcome of a Blockchain Experiment: Decentralized Learning Robots Swarm to Success

    Unleashing a Swarm of Decentralized Learning Robots: The Surprising Results of Blockchain Experiment

    Vishvasya: Revolutionizing Citizen-Centric Apps with National Blockchain Framework for Enhanced Security and Transparency

    Vishvasya: Revolutionizing Citizen-Centric Apps with National Blockchain Framework for Enhanced Security and Transparency

  • Applications
  • Culture
  • Deals
  • Events
  • How-to
  • Roundups
  • Startups
No Result
View All Result
Tech News
No Result
View All Result

Google DeepMind Unveils Groundbreaking Benchmark to Enhance LLM Factual Accuracy and Slash Hallucinations!

January 10, 2025
in Tech News
Home Tech News

Our mission is to provide unbiased product reviews and timely reporting of technological advancements. Covering all latest reviews and advances in the technology industry, our editorial team strives to make every click count. We aim to provide fair and unbiased information about the latest technological advances.
Share on FacebookShare on Twitter

Enhancing Factual Accuracy​ in ‍Language ⁤Models: The FACTS Grounding Benchmark

Hallucinations, or erroneous outputs, remain a ‌significant⁣ hurdle for⁤ large​ language models‍ (LLMs), particularly when tasked with intricate challenges that require‌ precise and thorough answers.

A Breakthrough in Model Evaluation

Researchers at Google ⁣DeepMind have made strides‍ toward ensuring factual correctness in foundational models ⁤by ⁤introducing⁣ the FACTS Grounding benchmark. This innovative ​assessment evaluates how well⁤ LLMs produce ⁤accurate information derived from extensive documents. ‍Additionally, ⁣it measures the sufficiency of detail provided ‍in their ‌responses ⁣to ensure they meet users’ ⁢informational needs.

Accompanying​ this new benchmark is ⁤the FACTS leaderboard, which has‌ been launched on Kaggle to engage the data science community.

Current‌ Leaders in Factuality ⁣Scores

The latest‌ rankings⁢ reveal that Gemini 2.0 Flash secures its position at the⁣ top ⁤of‍ the ⁣leaderboard with an impressive factual accuracy score of 83.6%. Other ⁤notable ‌entries among ​the top nine ⁤include Google’s⁤ Gemini 1.0 Flash and Gemini 1.5 Pro;‌ Anthropic’s Claude ⁣3.5 Sonnet and Claudette‌ Haiku;​ together with OpenAI’s various GPT-4o versions, ‍all achieving accuracy scores exceeding 61.7%.

![FACTS Leaderboard](https://venturebeat.com/wp-content/uploads/2025/01/Screenshot-118.png?w=800)

Evolving ‍Metrics for Model Performance

The developers assert their⁤ commitment to continuously ‌updating this leaderboard as new models emerge and existing ones ‍evolve over‌ time.

“This benchmark aims to​ bridge gaps by assessing a broader spectrum‌ of model behaviors related to factuality compared to existing ⁢benchmarks that focus solely on specific use cases like summarization,” stated ⁣researchers in⁣ a report published​ recently.

Tackling Misinformation ​Challenges

A ​key obstacle in guaranteeing fact-based responses⁤ lies within modeling—related ⁢aspects such‌ as architecture and evaluation metrics significantly impact outcomes.⁣ Traditionally, pre-training centers⁤ around predicting ‌subsequent tokens based on preceding text, which while informative does not fully optimize ⁤models for diverse ‍factual‌ scenarios leading often towards​ generating text ‍that seems plausible but lacks⁢ actual relevance.

The Structure⁢ of the FACTS Dataset

This challenge is‍ addressed through ⁢a robust dataset⁣ composed of 1,719 examples—comprising both publicly available (860) and proprietary (859) instances—that demand ‌comprehensive⁣ answers grounded⁢ firmly within contextual documents provided​ alongside each query:

  • System⁤ instructions: General guidelines directing models only⁣ to ​extract information from ⁣supplied context;
  • User queries: Specific​ inquiries requiring detailed exploration;
  • : In-depth foundational texts containing​ vital data pertinent ​to user questions;

A response qualifies as “accurate” when it engages​ thoroughly with these‍ long-form documents ⁤while producing replies entirely ⁣attributable back to them—as ⁤opposed ‍to being marked “inaccurate” ⁢if claims‍ lack ⁤direct ⁤support from those resources or lack​ relevance altogether.
For instance, if asked why a⁣ company’s revenue dropped during Q3 using its ​detailed financial accounts as reference material:

If an AI responds vaguely stating ⁤”The⁤ company faced‌ challenges affecting revenue,” such an output would be classified incorrect due certain absence specifying reasons like market shifts or competition spikes likely ‌present within ‍documentation reviewed.”

​

An Example Application: Asking⁣ for⁤ Financial Advice

If prompted instead ⁢about⁢ economical living strategies with associated tips organized tailored specifically towards students making ‍sound recommendations‍ based on‍ real practices‌ expectedly included​ gained clarity around actionable‌ advice like opting free campus ⁣activities buying supplies collectively cooking rather than ⁣dining out combined monitoring⁣ expenditures diligently avoiding excessive credit spends reserved ​resource usage wisely available,”end{em}

![Financial Strategies](https://venturebeat.com/wp-content/uploads/2025/01/Screenshot-120.png?w=800)

Diverse Inputs Across Various Fields

The team used diverse document lengths peaking up-to 32k tokens covering sectors including finance⁤ technology medical judgments amongst others festival​ wide scopes expanding number⁤ queries sophisticated rethink engagement methods ‌solicit summaries rewrite revitalization respective challenging attract wide audience⁣ attention.
Pitfalls⁤ arise two-tier judging processes involve evaluating ⁣eligibility conform disparate user wants excluding irrelevant generation results⁣ subsequently scoring against⁤ cultural references referencing initial guidelines stressing requirement hallucinatory-free‍ well-defined appropriating evaluations achieved leveraging multiple judge rankings average ⁤section outputs‍ showing substantial performance levels targeted foundations groundings!.....–and generate content adhering original stability framework promote illuminating constructive industry environments holistic ⁣averting ⁤individual biases reflecting every outcome measured exactly.”


…
… ⁢‍
}
⁤
…
inheritance⁤ accommodations further augment enhancements high reach performance retaining superior ‌AI design structures practices‌ testing capabilities! Current mindset founded grounds⁣ solidifying‌ collaborations yielding progressive ⁣innovations acknowledging foreseeable rivalry​ reinforcing competitive networks surrounding overall developments articulating abundance growth vision standards.” notes analysts writers!”

Completing Sample guideline ‍here ‍ ​ ⁢ ​ ⁣ ‌ ‍​ ‌ ⁣ ‌ ‍ ​ ‍ ‍ ⁣
…

} ​
​ ‌
… ⁢
“`

ADVERTISEMENT
Tags: AI benchmarksArtificial intelligenceBenchmarkDeepMindDeepMind innovationsfactual accuracyfactualityGoogleGoogle DeepMindhallucinationsImproveIntroduceLanguage modelsLLMMachine learningnatural language processingReduceresearchers

Denial of responsibility! tech-news.info is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – abuse@tech-news.info. The content will be deleted within 24 hours.
Previous Post

Empower Your Mind: Join Our Mental Health & Resilience Workshop for Climate Advocates and Professionals!

Next Post

Bluetti Apex 300 and EnergyPro 6K are incredible portable and home power solutions at CES 2025! – Phandroid

RelatedPosts

Nikon’s Z5 II is the cheapest full-frame camera yet with internal RAW video
Tech News

Nikon’s Z5 II is the cheapest full-frame camera yet with internal RAW video

April 5, 2025
The Morning After: Let’s talk Switch 2 pricing
Tech News

The Morning After: Let’s talk Switch 2 pricing

April 5, 2025
Amazon’s ‘Buy for Me’ AI will purchase stuff from third-party websites
Tech News

Amazon’s ‘Buy for Me’ AI will purchase stuff from third-party websites

April 5, 2025
Vibe coding at enterprise scale: AI tools now tackle the full development lifecycle
Tech News

Vibe coding at enterprise scale: AI tools now tackle the full development lifecycle

April 5, 2025
ADVERTISEMENT
Galaxy Ring wireless charging upgrade could ditch the case – Phandroid

Galaxy Ring wireless charging upgrade could ditch the case – Phandroid

April 5, 2025

Nikon’s Z5 II is the cheapest full-frame camera yet with internal RAW video

April 5, 2025

Mechanistic understanding could enable better fast-charging batteries

April 5, 2025

Apple users are ditching the AirTag for this $30 alternative… but why?

April 5, 2025

Grab the 2nd Gen Google Nest for Less than 100 Bucks! – Phandroid

April 5, 2025

How to use the new, easier Guest Mode on Vision Pro

April 5, 2025

The Morning After: Let’s talk Switch 2 pricing

April 5, 2025

Charging electric vehicles 5x faster in subfreezing temps

April 5, 2025

Deals: Moto Edge 60 Fusion and Pixel 9a arrive, iPhone 16  and 15 series are £100 off

April 5, 2025

iPhones Could Cost Up to $2,300 in the U.S. Due to Tariffs, Analyst Says

April 5, 2025

Categories

Archives

July 2025
MTWTFSS
 123456
78910111213
14151617181920
21222324252627
28293031 
« Apr    
  • California Consumer Privacy Act (CCPA)
  • Contact Us
  • Cookie Privacy Policy
  • DMCA
  • Privacy Policy
  • Tech News
  • Terms of Use

© 2015-2024 Tech-News.info
DMCA.com Protection Status

No Result
View All Result
  • California Consumer Privacy Act (CCPA)
  • Contact Us
  • Cookie Privacy Policy
  • DMCA
  • Privacy Policy
  • Tech News
  • Terms of Use

© 2015-2024 Tech-News.info
DMCA.com Protection Status

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version