I actually read the privacy policy. There are basically 3 segments of data:
The one time when you signed up.
All times you log in, after you've signed up.
User generated data
For part one: They store your username and the IP address used when you create the account. They store a hashed version of your password, not the actual password. They'll store that info for as long as you have an account with lemmy.world (although they reserve the right to keep it for up to 12 months after you've deleted your account). They store the hashed password so you can log into your account.
For part two: They keep a log of the times you sign in, the device you signed in from (iOS, Android, web) and the IP address you do it from. They delete this data on a rolling basis, every 90 days from the date the login data was created (from the time you logged in).
For part three: These are your posts, comments, upvotes, downvotes, etc. This is stored this until you delete your comment/post or undo your upvote/downvote. When you delete your account, if you haven't deleted your data, the connection (the association) between your account and the data itself is severed. This means that the comment will remain but the username value will be null.
tl;dr: I'm no expert but I think they keep a very small amount data. They probably do this to keep their costs as low as possible (but that is just my speculation).
If you're really worried about data mining and data logging, you can always go back to reddit /s
Just what you'd expect really (settings, profile data, posts/comments), not even user agent (what browser you use) is stored. But keep in mind any instance you sign up to could be using a forked version that inserts Google analytics or FB pixel or any other sort of tracking tech.
Data should never be actually deleted from the database, that breaks all the best practices. It can be overwritten with garbage though. But it should always be present.
For example, if you create a new account with email, username and password and get assigned some id like 42. Then after a while you want to delete your account. The account should stay intact, id number 42 should still be occupied, but your email, username and password should be replaced by null values.
There's tons more data than that that can be picked up. To start with, what posts you interact with and how. I'm sure there's loads of other data points that can be tracked.
I have yet to go through it all myself but from what I've seen of the Lemmy code it seems pretty straight forward. I doubt anything is being tracked other than what is required.
Obviously your IP has to be taken down so they can route traffic to you. Username and all info you put on your profile or post. List of liked/disliked posts, subscribed or blocked communities and people, perhaps metadata of any photos or videos you upload, the package name for whatever mobile app you use, etc.
All the code is available on GitHub for you to check out if you'd like, 80% of it is written in Rust. But I am looking through it myself to see what kind of privacy I can expect from Lemmy. It's already ahead of Reddit though, where I couldn't view the source code and just had to trust what the company said.
I installed Jerboa and noticed that it grayed the titles of posts that I had already viewed even though I had viewed them on the web. That told me (unless I am somehow confused) that the server side tracks what posts you have read.
From my perspective that seems like a terrible invasion. I can understand some benefit to showing the post status in the UI, but if it is stored at all, the storage should be exclusively on the client side. I mentioned this also in the "issues" thread and got no reaction, so maybe I'm missing something or in error.
Thanks, that is good to know, but that is a type of evil where I would hope Lemmy doesn't follow Reddit. I sometimes posted to Reddit but I more often read passively without logging in, partly to avoid some of the tracking.
I noticed previously that stuff I read in the browser does not show up as read in the mobile app. I also just tested it with different browsers and as far as I can see, read posts are marked as unread when I use another browser.
So are you actually sure about your claim? This is very easily testable, so I hope you have actually confirmed this before you accuse lemmy of participating in a "terrible invasion" of privacy..
I have said several times that I am not completely sure. I will see if I can do some better tests. It is something that I noticed when I installed Jerboa, so I asked about it, and people seemed to confirm that there was server side tracking.
Anyway, even if it's confirmed, unless there is deception involved (which I have no reason to suspect), there's not much of an "accusation" to be made. I would say, in the event that individual post views really are saved on the server, that Lemmy's designers made a policy choice that I don't agree with. I'd call that a description rather than an accusation. I'd try to open a discussion about getting the decision changed. If that didn't succeed, I'd look for technical workarounds and/or limit my reading on the site.
I can't say that the backends don't track that for sure because I haven't looked at the source or anything. But keeping a history is something very commonly done in the client. Just like Web browsers.
Right, what I saw (unless I'm mistaken which is possible) was reading posts on one client (Firefox browser on my laptop computer) and then seeing the read posts marked on a completely different client (Jerboa on my phone). That means the info must have somehow been communicated between the two clients. Suspicion points to the server. I will ask on /c/[email protected] about this and/or look at the code base.
I browsed and posted on Lemmy for a while through a desktop browser on my laptop, then installed Jerboa on my phone and started playing with it, and immediately noticed that posts I had previously read through the browser were marked in Jerboa. The only ways Jerboa could have gotten that info are: 1) the server recorded the info from the browser and relayed it to Jerboa, or 2) I was confused somehow and had also read those posts through Jerboa.
#2 above is something of a possibility but that leaves #1 as still not dispelled suspicion. I was hoping that someone familiar with the implementation would comment.
The unfortunate thing is that eventually, like how it happens in mobile apps, owners of popular lemmy instances will be contacted by marketers/data harvesters with offers like "hey, we'll give you $50,000 to install this data harvesting code on your site", which is difficult to turn down for many people.