Machine Learning

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

When training a transformer on positionally encoded embeddings, should the tgt output embeddings also be positionally encoded? If so, wouldn't the predicted/decoded embeddings also be positionally encoded?

2

ylai @lemmy.ml 4mo ago

huggingface.co From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

0

keepthepace @slrpnk.net 4mo ago

Torrent tracker for open models

aitracker.art Home :: AiTracker

Someone (Dreamertist on reddit) got tired of depending on Huggingface for downloading models and proposes a torrent tracker to share more efficiently these huge blobs.

It just started, only a few models uploaded yet, but I think it is worth that we all put our local stash online there. Making a new torrent is super easy (one missing step though: when "re-downloading" the model you need to save it in the directory where it already exists. This way it will "resume" at 100% completion and switch to seeding mode)

0

wargreymon @sh.itjust.works 4mo ago

Can gpt generate a gpt model?

Imagine AI giving offsprings...

8

smokinliver @sopuli.xyz 5mo ago

Where do these stains come from and how can I fix them?

Hey guys,

I have been experimenting with self-supervised visual learning a bit. Until now I have only ever used U-Nets and related architectures.

No matter what specific task, images or other parameters I changed I always encountered these stains on my output-images (here marked with green), although sometimes more, sometimes less.

Now I wondered if anybody could tell me where they came from and how I could prevent them?

In the attached picture the input (left) and target (right) are the same, so that I can be sure these stains do not come from a badly designed learning task, yet they still appear (output is the middle image).

Thanks in advance and all the best :D

Edit: added line breaks

3

Kit @lemmy.blahaj.zone 6mo ago

What are your thoughts on Microsoft Copilot?

Copilot sounds amazing on paper. The free (to 365 subs) version on the web is just Chat GPT4, so that's familiar enough. The integration with 365 applications is really what grabs me. Stuff like tossing it 10 spreadsheets and asking it to analyze and compare the data, having a virtual assistant to remind me of upcoming actionables, and summarizing a meeting when I zone out - it all sounds really handy.

I met with Microsoft last week and they're down for giving me a 90 day trial if I want to take it for a spin. Any thoughts or suggestions? I ideally want to determine if this will improve productivity for my end users enough to be worth the insane cost of $30/user/mo.

10

The Hobbyist @lemmy.zip 6mo ago

Looking for a specific OpenAI employee personal blog

Hi all,

I think around 1 or 2 years ago, I stumbled upon a personal blog of an asian woman (I think) working at OpenAI. She had numerous extensive fascinating blog posts on a black themed blog, going into the technical details of embeddings of language models and such.

I can no longer find that blog and have no other information to go by. Would anyone possibly know which blog I'm referring to? It would be very much appreciated.

3

ericjmorey @programming.dev 7mo ago

Where Is Noether's Principle in Machine Learning? | 2024-02-29

cgad.ski Where Is Noether's Principle in Machine Learning?

2024-02-29 | Christopher Gadzinski writes:

> Physics likes optimization! Subject to its boundary conditions, the time evolution of a physical system is a critical point for a quantity called an action. This point of view sets the stage for Noether's principle, a remarkable correspondence between continuous invariances of the action and conservation laws of the system. > > In machine learning, we often deal with discrete "processes" whose control parameters are chosen to minimize some quantity. For example, we can see a deep residual network as a process where the role of "time" is played by depth. We may ask: > > 1. Does Noether's theorem apply to these processes? > 2. Can we find meaningful conserved quantities? > > Our answers: "yes," and "not sure!"

0

mawss @sh.itjust.works 8mo ago

Gemini 1.5

blog.google Our next-generation model: Gemini 1.5

Gemini 1.5 delivers dramatically enhanced performance, with a breakthrough in long\u002Dcontext understanding across modalities.

Anybody got to try it?

0

ericjmorey @programming.dev 9mo ago

pythonspeed.com NumPy 2 is coming: preventing breakage, updating your code

NumPy 2 is coming, and it’s backwards incompatible. Learn how to keep your code from breaking, and how to upgrade.

Itamar Turner-Trauring writes:

> These sort of problems are one of the many reasons you want to “pin” your application’s dependencies: make sure you only install a specific, fixed set of dependencies. Without reproducible dependencies, as soon as NumPy 2 comes out your application might break when it gets installed with new dependencies. > > The really short version is that you have two sets of dependency configurations: > > - A direct dependency list: A list of libraries you directly import in your code, loosely restricted. This is the list of dependencies you put in pyproject.toml or setup.py. > - A lock file: A list of all dependencies you rely on, direct or indirect (dependencies of dependencies), pinned to specific versions. This might be a requirements.txt, or some other file dependencies on which tool you’re using. > > At appropriate intervals you update the lock file based on the direct dependency list. > > I’ve written multiple articles on the topic, in case you’re not familiar with the relevant tools: > > - “Faster Docker builds with pipenv, poetry, or pip-tools” covers using those three tools to maintain lockfiles. > - For Conda, see “Reproducible and upgradable Conda environments with conda-lock”.

Read NumPy 2 is coming: preventing breakage, updating your code

0

spaduf @slrpnk.net 11mo ago

Theoretical Foundations of Graph Neural Networks - Seminar

cross-posted from: https://slrpnk.net/post/3892266

> Institution: Cambridge > Lecturer: Petar Velickovic > University Course Code: seminar > Subject: #math #machinelearning #neuralnetworks > Description: Deriving graph neural networks (GNNs) from first principles, motivating their use, and explaining how they have emerged along several related research lines.

0

spaduf @slrpnk.net 11mo ago

Full MIT Lectures on Machine Learning in Genomics

www.youtube.com /playlist

cross-posted from: https://slrpnk.net/post/3863486

> Institution: MIT > Lecturer: Prof. Manolis Kellis > University Course Code: MIT 6.047 > Subject: #biology #computationalbiology #machinelearning

More at [email protected]

0

gazter @aussie.zone 1y ago

Hoping for an intro to machine learning for object detection

Hi! Hopefully this is a good place to ask. I've been googling around a fair bit, but haven't had much luck- I'm either finding ELI5 type articles, or in depth tutorials on setting up a model to tell the difference between a frog and a dog. I'm not sure if those are relevant to my concept.

I would like to implement a ML algorithm to detect a particular type of defect on a production line. Our current camera system isn't quite up to the task, but gives good, consistent imagery, and I have a good historical dataset. The product moves past the camera, it snaps a single black and white image, then the product moves on. This means that most of my images are more or less the same. These defects are obvious to the human eye.

Could someone please give me, a noob, a bird's eye view of how I would go about using ML to create a model for this? There's so many choices of tools and tutorials that I don't know which would be best suited to this use case.

5

kromem @lemmy.world 1y ago

news.mit.edu Machine-learning system based on light could yield more powerful, efficient large language models

An MIT machine-learning system demonstrates greater than 100-fold improvement in energy efficiency and a 25-fold improvement in compute density compared with current systems.

I've had my eyes on optoelectronics as the future hardware foundation for ML compute (add not just interconnect) for a few years now, and it's exciting to watch the leaps and bounds occurring at such a rapid pace.

0

grannyweatherwax @feddit.nl 1y ago

what are you reading this week ?

Hello Machine Learning Community,

The intention of this post is to replicate a similar tradition from R/machinelearning and to trigger engagement. This post will be created weekly.

What are you reading this week and any thoughts to share?

1

kernelPanic @lemmy.ml 1y ago

Solved

PyTorch Lightning is bottlenecked by the CPU

When I train my PyTorch Lightning model on two GPUs on jupyter lab with strategy="ddp_notebook", only two CPUs are used and their usages are 100%. How can I overcome this CPU bottleneck?

Edit: I tested with PyTorchProfiler and it was because of old ssds used on the server

2

grannyweatherwax @feddit.nl 1y ago

what are you reading this week ?

Hello Machine Learning Community,

The intention of this post is to replicate a similar tradition from R/machinelearning and to trigger engagement. This post will be created weekly.

What are you reading this week and any thoughts to share?

0

Ignacio @lemmy.ml 1y ago

ChatGPT is David Copperfield

13

fox @lemm.ee 1y ago

www.deepmind.com RT-2: New model translates vision and language into action

Introducing Robotic Transformer 2 (RT-2), a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control, while retaining web-scale capabilities. This work builds upon Robotic Transformer 1 (RT-1...

0

grannyweatherwax @feddit.nl 1y ago

what are you reading this week ?

Hello Machine Learning Community,

The intention of this post is to replicate a similar tradition from R/machinelearning and to trigger engagement. This post will be created weekly.

What are you reading this week and any thoughts to share?

1

Ignacio @lemmy.ml 1y ago

github.com GitHub - aerdem4/lofo-importance: Leave One Feature Out Importance

Leave One Feature Out Importance. Contribute to aerdem4/lofo-importance development by creating an account on GitHub.

0

ZephyrXero @lemmy.world 1y ago

Almost All Research on the Mind is in English. That May Be a Problem

www.wired.com Almost All Research on the Mind Is in English. That’s a Problem

Language can shape how you think in subtle and profound ways. But most researchers only study English speakers.

1

grannyweatherwax @feddit.nl 1y ago

what are you reading this week ?

Hello Machine Learning Community,

The intention of this post is to replicate a similar tradition from R/machinelearning and to trigger engagement. This post will be created weekly.

What are you reading this week and any thought to share on it ?

5

grannyweatherwax @feddit.nl 1y ago

what do you all think about a weekly "what are you reading ?" post ?

I'd love to know what others are reading, why they think it's awesome (or not). In general, get an exposure to other sub genres of ML. Most of the papers I read are in the computer vision domain cause of work so I'd appreciate reading more about others.

So...

Are you all interested in such a post ?
If yes, which day of the week ?

2

Ignacio @lemmy.ml 1y ago

Gaussian processes from scratch

peterroelants.github.io Gaussian processes (1/3) - From scratch

This post explores some concepts behind Gaussian processes, such as stochastic processes and the kernel function. We will build up deeper understanding of Gaussian process regression by implementing them from scratch using Python and NumPy.

0

TommyCat @lemmy.world 1y ago

Great series by Andrej Karpathy on machine learning and training

www.youtube.com /playlist

Great series on machine learning. Posting for anyone interested in more of the details on the AI's and LLM's and how they're built/trained.

0

maxerature @social.tath.link 1y ago

Why Are There No Consumer Server GPUs?

I work with machine learning tasks daily, both as an ML researcher and as a hobby. The difference between what I can do at work and at home is significant - an A40 at work can do far more than the 3080 I have at home. This obviously makes sense, given the massively increased price point.

However, what I find odd is how there are no consumer level server GPUs targeted towards ML on the market. The A40 is not just a scaled up consumer GPU, and with machine learning growing as a hobby, consumer and enthusiast-level server GPUs are a surprising market gap.

1

felixquinihildebet @lemmy.world 1y ago

IBM Watson machine learning stopped development at PyTorch 1.7

https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/#/

On the face of it, the ability to run models larger than GPU memory would seem to be extremely valuable. Why did they give up? Not everyone has an 80GB GPU.

Was the performance too slow?

1

Technotica @lemmy.world 1y ago

Vicuna 13B vs 33B

Hi! I posted this over at lemmy.world in the technology sublemmy, but I thought maybe some knowledgeable people here could answer the question easier.

Oh and I know it's not really designed to give out functioning code, but it does look good enough I thought I'd ask.

I have both Vicuna-13B and the preview version of Vicuna-33B running on my desktop and asked both a simple coding question from another user:

"What can I use to draw curved text in an SVG?"

I got these two answers, I am not really a code but, while the code looks like it could work, I don't think it is correct?

Can someone here tell me if either answer is correct or more correct than the other?

Thanks!

33B:

.....

User: How are you doing today?

Vicky: Today, I'm doing well, thank you for asking! How about you?

User: I am doing great! what can I use to draw curved text in an SVG?

Vicky: To draw curved text in an SVG, you can use the <textPath> element. This allows you to wrap text along a path defined by an SVG path or elliptical arc. First, create the path using the 'path' or 'ellipse' element, then use the <textPath> element to wrap the text along the path. Here's an example:

+++ html <svg xmlns="http://www.w3.org/2000/svg" width="800" height="600" viewBox="0 0 800 600"> <path id="curve" d="M200,200 Q400,200 400,400" fill="none" stroke="black" stroke-width="2"/> <text text-anchor="middle" dominant-baseline="central" x="300" y="300" text-path="url(#curve)"> The quick brown fox jumps over the lazy dog </text> </svg>

+++

In this example, a quadratic curve is defined using the 'Q' command in the 'path' element, and the text is wrapped along the curve using the <textPath> element with the 'url(#curve)' attribute. You can adjust the 'd' attribute of the path and the text's position to create different curves with the text. .....

13B:

.....

User: what can I use to draw curved text in an SVG?

Vicky: In SVG, you can use the "textPath" element to create curved text. You can also use the "text" element and position it using "transform" attributes to achieve the same effect.

User:Can you give me a code example for that?

Vicky: Sure! Here's an example using the "textPath" element:

+++ php <svg width="100" height="100"> <text xlink:href="#text1" y="50" text-anchor="middle" font-size="20"> This text has a curved path. </text> <rect x="50" y="50" width="30" height="10" fill="#000" /> <text xlink:href="#text2" x="70" y="50" text-anchor="middle" font-size="20"> This text has a curved path. </text> </svg>

+++

And here's an example using the "text" element:

+++

php <svg width="100" height="100"> <text xlink:href="#text1" y="50" text-anchor="middle" font-size="20"> This text has a curved path. </text> <rect x="50" y="50" width="30" height="10" fill="#000" /> <text x="70" y="50" text-anchor="middle" font-size="20"> This text has a curved path. </text> </svg>

+++

*Note that you'll need to define the curved path for each text element using an SVG path string. In this case, the path strings are #text1 and #text2.

To use the same path for multiple text elements, you can define the path once and then reference it using the xlink:href attribute. The text-anchor attribute determines the point of the text element where the path is anchored.*

.....

(no idea why Lemmy doesn't italicize that last text part)

13B-Vicky gave more code, no idea which code example is better. 13B runs much faster than 33B, it would be really interesting to know if either or both example codes are valid and if 13B, despite being the simpler model gave a correct answer. 33B is a preview model not the final version.

What I am noticing is that neither has actually used the textPath element, though 33B used text-path, no idea if that exists.

p.s. by offloading 41 of the 63 layers to gpu 33B runs as fast as 13B without offloading but with CUDA on my machine. Very interesting.

0

kraegar @lemmy.world 1y ago

Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift

openreview.net Reversible Instance Normalization for Accurate Time-Series...

Statistical properties such as mean and variance often change over time in time series, i.e., time-series data suffer from a distribution shift problem. This change in temporal distribution is one...

I found this paper when implementing TSMixer in darts and thought it was worth sharing.

As far as I know, it may not be relevant to non-timeseries problems. That being said, does anybody know of instance normalization outside of timeseries? It is obviously similar to batch normalization but is slightly different.

I added RIN to an N-HiTS model using the Energy benchmark dataset and got 21% and 22% reduction in MSE and MAE, respectively. This required two lines of code to be added to the model definition and increased compute time in no noticeable manner.

0

bbbbb @lemmy.world 1y ago

Learning to Generate Better Than Your LLM

arxiv.org /abs/2306.11816

0

alphapro784 @lemmy.ml 1y ago

Machine Learning books recommendations?

I'm trying to learning machine learning from a more mathematical/theory side of machine learning just so its easier for me to understand AI/ML papers that are coming out just to keep up with them. I would say that I have a basic understanding of AI/ML but more so on the applied side like in Keras, TF, PyTorch somewhat but I feel like I am lacking on my understanding on the mathematical side of AI/ML. So any books and course recs for that?

8

digikar @lemmy.world 1y ago

Adaptive Resonance Theory vs Backpropagation-based Machine Learning

In recent weeks, I came across Adaptive Resonance Theory that attempts to solve similar problems to what the mainstream backpropagation based machine learning and deep learning attempt to solve. There's also a 2017 paper on DeepART. The interesting part is the claim that ART overcomes much of the problems associated with backpropagation. Inspite of this, in my experience of the last 5 years of seeing machine learning, I've seen backpropagation much more and almost never seen adaptive resonance theory, and google trends seems to agree with my experience.

So, are there inherent problems that adaptive resonance theory does not solve, for which backpropagation-based machine learning has worked out pretty well? Or is it just another coincidence in the tech industry for why something won out in favour of something else?

PS: I've replied to the comment, I can see the comment on lemmy.world but not on lemmy.ml

2

Felemuso @feddit.de 1y ago

Stanford and Cornell Researchers Introduce Tart: An Innovative Plug-and-Play Transformer Module Enhancing AI Reasoning Capabilities in a Task-Agnostic Manner

www.marktechpost.com /2023/06/17/stanford-and-cornell-researchers-introduce-tart-an-innovative-plug-and-play-transformer-module-enhancing-ai-reasoning-capabilities-in-a-task-agnostic-manner/

0

☆ Yσɠƚԋσʂ ☆ @lemmy.ml

1y ago

Researchers have created a small device that ‘sees’ and creates memories in a similar way to humans.

www.rmit.edu.au Tiny device mimics human vision and memory abilities

Researchers have created a small device that ‘sees’ and creates memories in a similar way to humans, in a step towards autonomous decision making.

2

☆ Yσɠƚԋσʂ ☆ @lemmy.ml

1y ago

QR Code AI Art Generator

huggingface.co QR Code AI Art Generator - a Hugging Face Space by huggingface-projects

Discover amazing ML apps made by the community

0

radarsat1 @lemmy.ml 1y ago

Meta AI -- I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI

ai.facebook.com I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI

I-JEPA learns by creating an internal model of the outside world, which compares abstract representations of images (rather than comparing the pixels themselves).

0

☆ Yσɠƚԋσʂ ☆ @lemmy.ml

1y ago

yonom.substack.com Native JSON Output From GPT-4

When integrating LLMs in your products, you often want to generate structured data, like JSONs. With the help of function calling (released June 13th 2023), this process has become much simpler! In this post I will explore the new API. Thanks for reading Simon’s Substack! Subscribe for free to recei...

0

hbar @lemmy.ml 1y ago

news.mit.edu New model offers a way to speed up drug discovery

A model known as ConPLex can predict whether potential drug molecules will interact with specific protein targets, without having to perform the computationally intensive calculation of the molecules’ structures.