Not related to you, but from a description in the first link, in the description for Plausible:
> Because it does not use cookies their is no need to show cookie banner for this service.
This is IMO a rather fundamental misunderstanding of the current situation.
I'd be hesitant to using a product from someone who I think have misunderstood completely what the rules are about. (Again, IMO and also IANAL but I have followed GDPR more closely than most people.)
GDPR is about collection information, as far as I can see, the technical detailsbof how you do it doesn't matter. It could be pure magic and would still be illegal.
I've actually had this discussion with Plausible directly back in 2022[1], and more recently with the lawyer they had write a blog post[2] on the topic. I wrote an article on it, that was recently discussed here on HN [3].
The response from Plausible is essentially "we've checked with legal council, and stand by the statement". The conversation with the lawyer started out well, but he stopped responding when I asked about the ePD, not GDPR.
There generally seems to be a lot of confusion, even in legal circles, about what ePD requires informed consent for. Many think that only PII requires consent, or think that anonymization bypasses it. That amount of confusion makes it very easy for a layman (e.g. Plausible) to find _someone_ willing to back up their viewpoint.
The EDPB released a guideline in 2023 that explicitly states that what Plausible et al. are doing is covered by the ePD's consent requirement, but that's a little too late: the implementations in member countries already differs massively on whether it's covered[4].
> There generally seems to be a lot of confusion, even in legal circles, about what ePD requires informed consent for.
That seems to be true, going by this comment section and the other ones I've seen.
It's hard to get a non-hyperbolic answer to the question: if everyone is so confused, what's the real-world consequence of best-effort implementation?
Some would say it's the ultimate responsibility of the app owner to understand the law, but how much further can you go than hiring a lawyer?
If more diligence needed to be done than that none of us would get anything built, we'd all just be running around researching the laws around these dumb popups.
What are the real-world consequences of making a mistake here? What kind of boundary would you have to trip over to actually get the authorities to prosecute you for not having a consent popup or doing it badly?
That is unfortunate, and seems to be similar to ADA compliance, as far as what is truly compliant and what is not. It seems like it is up to the courts to decide (speaking as an American, I know GDPR is a European law). I try to do as much as possible to keep up to date with ADA compliance and best practices, but when it comes to tooling around scanning for non-compliance, there seems to be differences. I believe that showing that you made an effort to comply is usually enough to avoid a lawsuit, but it would be nice if things like this were spelled out more clearly for those that need to implement these features.
I have recently gone through a conversation with a client that has been told in NY state (in the US) that something similar to GDPR is coming for those that deal with PII. Both the client and the agency I work for have added various scripts to the website for dynamic forms, tracking (Google Analytics), and newsletter functionality. It's at a point where everything that is 3rd party has to be discovered first, then seeing if there is the ability to anonymize everything (either by default, or with a user consent dialog). Even with current laws, it seems intentional to keep things vague.
Agreed. The company I work for has fought off two "ADA trolls" in the past ~3 years. I'm fully behind accessibility, and we design/develop our website specifically to conform with best-practice; I get, and generally accept, that civil remedies are (currently) the only way to enforce any kind of compliance. I nevertheless call the lawyers targeting us trolls, because their technical analysis was beyond incompetent, and their understanding of accessibility issues woefully out of date. It cost a few days of my + developer time, and I don't know how much lawyer-time, to make them go away.
We (I'm in the US) badly need clarifying regulation. Until then, compliance will mainly be about preventing yourself from being low-hanging fruit for opportunistic litigation - which, to be clear, can generate productive results, but is clearly inefficient.
It is not entirely clear who wrote these descriptions. Maybe it was not the vendor. At least their website https://plausible.io/ has a much better wording.
> No need for cookie banners or GDPR consent
>
> Plausible is privacy-friendly analytics. All the site measurement is carried out absolutely anonymously. Cookies are not used and no personal data is collected. There are no persistent identifiers. No cross-site or cross-device tracking either. Your site data is not used for any other purposes. All visitor data is exclusively processed with servers owned and operated by European companies and it never leaves the EU.
Correct, it's not so much about Cookies, but how data is collected and what is stored.
We have done a privacy risk analysis with an external lawyer and data protection officer, and concluded that Pirsch is in line with GDPR as we do not collect nor store personal identifiable information (PII). Processing stuff like IP addresses for example is legal as long as they are not stored and only cached for a reasonable amount of time (a few milliseconds in our case).
If you're interested, we have extensive documentation on this. You can reach out to support@pirsch.io to get it :)
If anyone is interested in doing something similar. This did cost us about 8,000 € in Germany.
The apparently extensive legal assessment you just described costed just 8'000 euro?
I am sorry but that had to be some hasty review at best. Do you take the full legal risk in case any of your customers would be found in violation of privacy laws because of using your service?
For reference, with similar hourly rates as Germany, reviewing a standard apartment-purchase contract cost me ~3500 euro.
We had someone with a lot of experience in this field working for very large German corporations and got a discount/startup bonus. I wouldn't call it cheap.
Imagine starting a business in Germany. How are you suppose to pay 30-50k for legal questions before selling anything?
Analytics and other forms of tracking are not required to do do business. Don't try to skirt the law and you won't have as many legal questions to answer.
You need consent for (not functionally necessary) cookies because of the ePrivacy Directive (the "cookie law"). Additionally, you also need consent for processing, storing or sharing personally identifying information (PII) because of the GDPR. Usually you do both in the same consent popup.
Plausible doesn't store visitor's IPs or any other PII, and doesn't set any cookies. The reasoning given in the quoted paragraph is incomplete, but the result is correct. You only need to mention them in your privacy policy, they don't require any opt-in popups
PII isn’t a concept in GDPR. GDPR talks about personal data, which on its own might not be identifying, but which in combination with other personal data can successfully identify a person.
I'm curious: running a static website with no JS-based analytics whatsoever — only Apache logs in standard format (so including IP address and user agent string) — does GDPR require consent banners in this case? If so, doesn't essentially every website require consent banners due to the way websites work?
GDPR does not require a consent banner. If you want to process the user's personal data outside what is strictly necessary, you need permission. One way to get that permission is for the user to specifically consent to it. It does not have to be a banner. (In fact, many banners out there are probably not enough for informed consent anyway, as they provide no information about what data is collected or any reasonable way to opt out.)
Personally identifiable information has nothing to do with javascript, or analytics. Do you have GET requests with parameters containing enough to identify a specific individual? Then your logs are sensitive and you must have a valid contract, informed consent, or provide some important service where this information is necessary.
There are gray areas which can make this difficult, but you the basic idea is enough information to identify an individual. A basic website where you log that IP address A viewed home.html is not enough. The knowledge that a 55 year old woman with particular name on a particular street address has an interest in photograhy and shoe size 9 probably is. The line is somewhere in between.
If I install the Apache web server and accidentally expose the machine to the internet, am I violating GDPR by not having a cookie banner on the "Apache Default Page"?
GDPR is about collecting personally identifiable information, which is distinct from aggregate data that you can't trace back to the individua. Recital 26:
> The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.
So details definitely matter. Some self-hosted analytics do this by getting rid of the last octet of the IP address, though I doubt that's been tested in courts.
I posted a quotation straight from the recital of the GDPR that says anonymised data does not matter. I even gave a reference that you can look up. The recital even ends with this:
> This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.
There is no ambiguity here, aggregate data is completely fine as long as I can't trace it back to you with a reasonable amount of effort.
A DPO would disagree with you depending on the circumstance; if you know a user is unique then you have a fingerprint; if you keep that fingerprint forever, when the user comes back to the site, it's trivial to know it is that user.
Yeah, not using cookies is irrelevant if you use other means to track user. Also people like to think they need to show the "cookie banner" for all cookies regardless of how they are used.
No, they are comparable, but it's an independent tool. When we started, Plausible wasn't as big as it now is. We also had a focus on deeper integrations via API from the get go, a nicer dashboard, and a few other minor details.
I basically started this for my personal use as a library for Go, which it still is:
how do you calculate the session duration? is it the delta between two page hits or similar events?
i tried a couple of the smaller analytics tools, like plausible, simpleanalytics, umami etc... and one thing that i always disliked was the way the session duration was calculated - i have a lot of longer articles where the visitor stays for a long time and then leaves. most of these tools will count that as a bounce, as there is no two hits to calculate the delta between. but for me it is a very important metric to get accurate numbers on, which is impossible with that implementation for sites like mine (very few but long page visits, not a lot of navigation between pages).
do you handle this the same way? that would be a feature i'd be willing to switch my current tool out for.
Yeah, we also use the delta. However, you can send a custom event on close to update the session duration. The session won't be counted as bounced in our system then and the time is updated.
They simply don't work as well as non-web apps. People continue to insist that they do, but from my experience, they just don't have the same smoothness as a native app to show that it's not a web app.
sure for something you're spending hours on like instagram. for my business data analytics, I don't care. If I'm doing any serious work I'm on laptop anyway, mobile is just for casual checks
a native mobile app is a gigantic time, productivity, and cash investment. if a business can get most of the value from a PWA, they will be far better off investing that time and innovation into other parts of their business than building a native app for the "smoothness"
There are lots of ways to make it cross platform pretty easily if you plan to do so from the beginning, such as React Native and Flutter. Even now, if the site is in React, it is not too difficult to port it all to RN, which also has a web version that is quite similar to React proper. Plus, RN and Flutter have PWA support already too.
Try something simple like Instagram via the browser versus as an app, it's simply smoother on the app. I would have to dig up more examples but IG immediately comes to mind as a recent experience.
I used Umami and mention it in the video. Admittedly, it was a mistake for my use case. I had to heavily modify Umami due to lack of features and performance issues. There are a also lot of bugs in the project which are immediately revealed simply by enabling TypeScript strict flags, and some more linting rules. Granted, I was not really using Umami exactly as intended. I do think it's great this project exists, and whilst I had to heavily modify it for my use case, I did at least help the upstream project diagnose one issue: https://github.com/umami-software/umami/pull/2946#issuecomme...
They also gave him a job offer, but yes, Tempo has been pretty aggressive in trying to keep their game from getting "solved" by third party tooling collecting analytics on the game.
Been using it for my personal website for over a year as a self-hosted solution. Not great if you want just to set it up and forget about it. There are breaking changes every now and then on every part, DB and the FE. So at some point it just broke for me and stopped showing relevant data. I ended up switching to piratepx as it was enough for me to see if there were any visits.
Same here. My self-hosted instance is broken right now and I've not been able to find time to fix it. The pace of change was easy to keep up with when it was just 1 guy.
Now it appears they have built an entire team and raised some VC to build out their SaaS.
It's an incredibly bare-bones analytics tracker, but it's free and cloud-hosted which were the two things I was most looking for in an alternative to GA.
I run a website that gets about ~300k pageviews/month. Vercel was eating my wallet alive with their analytics offering. All I wanted with my tracker was to feel motivated by knowing that traffic was going up and to the right. I didn't want to pay hundreds a month for that and I didn't want to manage my own server just to have analytics. Goatcounter addressed my needs well!
I just stood up a new toy project (excepted traffic is next to nothing but I still want to be able to tell) and was just thinking I needed something like this. Thanks!
Something I've noticed about all these privacy respecting analytics apps is they all seem to be using a similar UI, in fact the only one I know that uses a different UI is Motamo.
I wonder why that is? My suspicion is this is layout is becoming a "standard UI" for analytics software. Often times I see companies in the same space largely mimic each others UI's. Things like the "Find Care" option in most healthcare sites looks largely the same. Same thing goes for LLM frontends and time tracking software. It just seems that each team has individually come to the same conclusion about what the "best UI" is for a given task.
To me it's just a case of if it ain't broke don't fix it. I don't agree that "each team has individually come to the same conclusion". You can develop a product way quicker by cloning.
In this case the value prop is in the open source, self host, privacy first. Why try to innovate on the UI?
in my exp, there's only a few good UI libraries out there which is part of it, but I think generally the "standard UI" is a thing much like it happened to architecture and then we got the reset of brutalist / modernism, to highly generalize.
Just thinking aloud, choosing between novel, unknown, and interesting UI/UX vs proven, reliable, and commonplace (read: boring) is always a toss up, coming down to the audience. Biz interests usually tend to the latter, which imo is good because you want people using the product, not thinking about how to use the product.
I know it's dumb, but when a library requires me to install Yarn to install it, I think so much less of it.
I just hate the idea that I have to install an entire package manager simply to use some Node.js code, when NPM almost certainly could have done the job.
It's not a library, it's an application. You're quoting a section called "Installing from Source". Obviously that'll require you to use the build tools the developers happened to have chosen.
There's also a Docker option if you don't want to do that.
Seems like it’s because they’re using the resolutions feature to override a dependency resolution. The alternative would be forking at least 1 package, all the way down to the dependency, to fix the version.
There’s a reason there’s 3 popular package managers for Node that aren’t NPM. Yes, part of that is the culture/ecosystem, but not entirely.
True. Either way, Umami's been using `yarn` since 2020, before that release of NPM (although for what reason at that time, I don't know).
Being bad thereby creating desirable competition has lasting effects. We could get into when/why/what each thing supports all day, but it's not worth it.
Speaking of what's supported nowadays, installing other package managers is a corepack call away -- literally a whole other feature built into Node.js because NPM is/was/etc subpar. It's experimental, but this is all to say: it doesn't surprise me in the slightest that a project might use something that isn't NPM, and I actively expect it when picking up other's projects.
Something that specifically has documentation on bypassing anti-tracking security software[0] is not "privacy-focused". Your users have indicated that they do not want you to track them and have gone out of their way to stop you from doing so. Attempting to bypass that is specifically taking steps to undermine their privacy when you absolutely know they do not want that.
A "privacy-focused" solution (not that software that's specifically made for spying can be "privacy-focused". Let's call a spade a spade: it's spyware) would at least use standard endpoints to make it easy for users to opt-out by blocking those endpoints. In this way, GA is actually more privacy friendly.
"Some lists can be overly agressive" is also a bad attempt at gaslighting. Your software watches the way I browse, including tracking purely client-side events and outbound links (see "journeys"). You attempt to track things like device characteristics and what operating system I use. That's creepy and voyeuristic to me, and is exactly what spyware blockers are for.
Actual privacy-friendly analytics looks more like the Steam hardware & software survey where they ask if you'd like to tell them these things, and show you exactly what they are going to collect.
Absolutely, privacy preserving analytics is an oxymoron. And besides survers without informed consent the best solution is to simply track less. Think hard about what information you actually need and will even look at for more than once.
We recently migrated from Matomo to Umami at work after hitting scaling issues with Matomo, even after implementing various MySQL optimizations and archiving reports through cron at a decent interval. Even the most basic tasks like loading the dashboard was painfully slow (before you comment on the resource usage, our instances were quite huge and the load was alright).
Surprisingly, Umami has been handling our traffic volume without breaking a sweat on much smaller instances. I suspect PostgreSQL's superior handling of concurrent writes plays a big role here compared to MySQL/MariaDB. Except for the team/user management, everything feels much nicer on Umami.
Shameless plug: As part of the migration, I also took the opportunity to learn some Rust by writing a small utility that uses the Umami API to generate daily/weekly analytics reports and sends them via email[1]. Pretty happy with how it turned out, though I'm still learning Rust so any feedback or suggestions for improvement are welcome!
I am also curious about the traffic amount and server specs.
In my experience, MySQL still runs very well until you have 10-20m rows (on a single machine, like 8vCPU and 32GB RAM), after it gets trickier to get instant responses.
We had huge servers, with the database and the application itself running on separate instances. IIRC, we had a 32 core, 64GB instance just for the DB itself which we doubled when we started adding more sites to our configuration and it still wasn’t enough. As for the numbers, our site(s) get heavy traffic everyday, in millions daily, since we are a stock broker.
You’re right about MySQL performing alright for 10-20m rows, but from our perspective those numbers are not that big for a company this size.
> our site(s) get heavy traffic everyday, in millions daily
Yeah, it's hard to run aggregate queries on MySQL once you are talking about hundreds of millions of rows, or billions. Even though, if the server has modern CPU, enough RAM to store the entire DB and NVMe storage, it's still okish with the right indexes and if the queries are optimized.
We had separate database and app instances, the DB instance had 32 cores and 64GB memory, which we doubled to keep up with our requirements. We have tens of millions of visits daily, and our database was close to ~300GB within the first few months.
For plausible I believe that since it runs on Postgres, scaling should not be a problem as long as you scale the resources with it.
In all honesty, these optimizations are quite basic. We already used MariaDB instead of MySQL itself. Other things listed in the post are something that we have standard across all our databases, well, except for deleting the data to speed up the database.
No, unfortunately our company’s and external regulatory compliance policies require us to host all data within the country itself, alongside it being required to be run on an infrastructure that is easily auditable. So as a policy within the company, all our internal services are open source and self hosted.
I have been using (and self-hosting) Umami for 3 websites for the better of a year. While good for my use-case, which is just having some 'fun' insights of how many visits my pages get and where that traffic is coming from, it's mostly aimed at my profile I reckon. Would never use it for a business purpose. Also the UI is somewhat immature still.
So all in all: total fan, otherwise I wouldn't be using it, but it's fairly limited in what it can do.
There are options missing from the admin panel or the dashboards that intuitively seem basic. Some examples:
- excluding certain data from your report (such as localhost visits)
- setting a default time range
- on the 'overview' of all your domains, setting a time range that applies to all and not just one domain individually (this particularly felt really counterintuitive)
There is more I'd come up with if I actually pulled it up, but the overall throughline is that it just feels 'too basic' at this moment. Especially for something that goes beyond tracking visits on your personal blog and/or hobby website.
I have checked them out and decided not to go with it since it was overkill for my situation. Like I said, I don't need a whole lot more than visits, geo, referrer, and track events[1] which work really nicely. If I ever do need a GA4 alternative for 'serious' purposes though, I might consider Matomo as it seems a lot more complete.
As someone who doesn't really often look at these sort of charts... traffic coming via chatGPT being higher than Bing was quite the surprise to me. Makes of course total sense, but still astonishing to see the actual numbers in comparison.
Why can't people be more creative with finding good names? Salt? X? Yes, Apple is a terrible name too, but at least it was historically called Apple Computers and only changed once they had a huge brand already.
> Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
There was debate on the team back in ~2009 over changing the query params -- when you estimate how much bandwidth and disk is consumed per day at the scale of Analytics, just for the difference of a character or two on a handful of query params -- even well over a decade ago, it really was astounding.
The migration would never be total, though, because more than a few users had copied the analytics script to their CDN or were in some way depending on a local fork of it. Some backwards compatibility would need to be maintained, perhaps for as long as the product exists. For this and other reasons, changing the params (and supporting both, indefinitely) wasn't pursued.
I have been using Umami and I can't thank the people behind this project enough. It just works !! Not just it works well, they pull no tricks when it comes to self deployment and running it on your own. Many open source has a "run your own" option, but they would also want you to not run and use the managed version. I understand that part that they want to make money. But Umami, just works, you will never realize that there is an option where you need to pay $$. But if I make enough from my work, I am sure becoming a paid subscriber for their managed analytics.
Providing some context which I think is missing. It originally started as shown HN a little over 4 years ago [1]. It was a one man (Maco )side project.
For a long time, may be at least 2-3 years it was simply an open source, use as you please MIT software with no hosting offering. I remember when it launch the first thing I notice was the graphics design was so much better than other offerings. While I was interested at the time, I ultimately went with plausible. Mostly because it seems to have a sustainable backing, and another major reason was the usage of ClickHouse. If you want best resources usage / scale / performance analytics I don't think there are many other choices. A lot of other solutions, including pleasurable settled on clickhouse as well.
I have recently went to check on them again and looks like things have progressed a lot. They have hosting offering and they are also working on Clickhouse integration.
Between Urchin, Reinvigorated, and another one I can remember its name, it's been a long time I am excited about analytics. I just hope if Mcao read this. Congratz and well done .
I've used umami for my projects with low volume and it has worked great. Set it up on it's own instance connected to a postgresdb and it's been great for the past year.
There needs to be a convincing explanation of how google style analytics can be privacy focused, otherwise I disregard this as a clone with feel good branding
2) who gives a shit, genuinely who cares about cookies?
3) so it's privacy oriented in that the dev doesn't send their data to google, but users send their data to someone anyways? And why would sending data to a self hosting rando be safer than sending it to google from a user and security perspective?
Google’s entire business is your personal data. I’d much rather my browsing info be sent to some small website for local analysis for their own purposes vs. hoovered up by the largest data broker in the world and aggregated with everything else about me.
Did you read the recent article about how reCAPTCHA is used for tracking you across the internet? I have no reason to believe GA isn’t similar. And just think about how much info they have on you from reading all of your emails.
It is privacy oriented from the perspective of the company, not the individual. I think there is some value in that. Although it makes it no more likely to be secure or private for the individual end user visiting the site though.
Privacy oriented from the perspective of the company is at least more privacy oriented from the perspective of the user. A company harvesting my data for analytics is more private than two companies harvesting my data for analytics.
If I'm going to a site, I'm willingly sharing some of my personal data with that site. I'm not implicitly consenting to third parties harvesting my data.
I mean you can probably do something clever with this like rainbow tables on fingerprints or something like that which is more probabilistic so you never store individual fingerprints. Would be interesting to know what the solution is.
Sure but any probabilistic approach is either relatively inaccurate reduces it's usefulness for this use case, or accurate which raises the same identifiability concerns cookies would introduce. I guess my point was; this has been thought about already. (Pseudo) anonymized attribution is a bit of a solved problem and you can do it with or without cookies. That's mostly a implementation detail rather than a distinguishing feature.
> (Pseudo) anonymized attribution is a bit of a solved problem and you can do it with or without cookies.
How is it typically done without cookies then?
> any probabilistic approach is either relatively inaccurate reduces it's usefulness for this use case, or accurate which raises the same identifiability concerns cookies would introduce
How so? Even if it's accurate you wouldn't be storing anything the information (random id or fingerprint) for the individual user, so you would only be able to answer with reasonable certainty whether you saw the user before or not. You can't identify anyone from that (other than identify them as a new vs returning user) so there is no identifiability concern, unless of course one thinks that constitutes a concern in itself which I don't think the GDPR does.
Maybe we're disconnecting. Cookies are just a standardised way to communicate a small key/value set between client/browser and server through HTTP headers. It's not inherently (in)secure, sensitive, etc. There are zero things you can do with cookies that you cannot do without and there are no inherent differences in security, they're just very convenient if you're in HTTP world.
And yes what you said is exactly right; you're allowed to fingerprint a unique user and track data with that fingerprint as the sole unique identifier without any PII legislation (GDPR, CCPA, etc.) compliance issues. You just cannot store any information that allows linking PII data to that fingerprint in either direction. In other words, attribution to a random UUID that just happens to represent an anonymous user is not an issue.
Circling back to the original comment; there is no (good) argument against cookies if you're basically doing exactly what cookies are doing. Umami using it as a USP is, at best, a little odd.
> you're allowed to fingerprint a unique user and track data with that fingerprint as the sole unique identifier without any PII legislation (GDPR, CCPA, etc.) compliance issues.
I don't think this is correct, or at the least it's unfortunately phrased. If your fingerprint is so specific that it can distinguish unique users, it is covered under GDPR compliance. I don't know too much about the CCPA so not sure if it's the same there.
Yes, you are allowed to collect device statistics such as form factor, viewport size etc. But if you can distinguish between two different users with identical devices accessing your site at the same time, under GDPR you have an obligation to inform [14]. And if you can recognize a returning user across sessions, you also need consent.
If the random user ID is truly anonymous (so, cannot be linked back to an identifiable person even with other data you have), it is not personal data under GDPR and no obligation to inform or consent is needed. If the data processor stores any information that makes PII attribution possible then, and only then, does it fall under GDPR, CCPA, etc. That random ID being persisted on the device allowing for subsequent attribution is still not PII sensitive unless/until the aforementioned identifiability barrier is breached. This is exactly why prominent analytics platforms (Plausible, Matoma, Mixpanel if configured correctly, etc) all offer data hygiene barriers.
I suspect what's happening here is that the word "user" is making things ambiguous here. It was meant in the context of attributable session, not as the data subject as per GDPR language for example.
I don't know about Umami, Plausible describes how they solved this here: https://plausible.io/data-policy, under the section "How we count unique users without cookies"
TL;DR: They derive an identifier from IP address and User Agent using an hash, allowing them to have a tracking identifier without storing Personal identifiers (the IP address)
They salt the values and compute id = hash(daily_salt + IP + UA). Then they remove those every 24 hours. I think it sounds like a perfectly reasonable solution.
If they remove those every 24 hours, then doesn’t that mean if I made two visits, separated by more than 24 hrs, it would count as 2 unique visits rather than 1?
Yes. I am still not aware how to track returning visitors if while still staying within some privacy framework. Unfortunately on HN the default answer is to say not to track at all.
I was under the impression that this is the exact kind of thing that violates the GDPR. That is.. processing an identifier (IP address) to do something more (track user actions across multiple requests) than what is required (route traffic to the server).
I had a lot of problems with getting Umami to work with Vercel, in the end I abandoned it and used Vercel Analytics.
First Umami was being blocked by all ad blockers and it caused my web app to crash, then I had to use some workaround which allowed it to work with ad blockers but stopped my Express app from sending responses back to the frontend.
It irritated me so much, because there is not enough documentation for it, although I did find multiple people reporting something similar like me. I just called quits on it, analytics shouldn't be that complex to setup if you are not self-hosting, the Vercel one is serving me okayish for now.
My recent experience with wanting analytics for an enterprise application was that for my limited needs, it was easier to roll my own than to deal with evaluating all the options and integrating with another service. There are already a ton of privacy focused alternatives to google analytics. So many such that finding one that serves a niche is practically ungoogleable.
I like umami. What stood out to me is their customer support. I had some sort of issue where my free limit was always maxed out. They responded and resolved it very quickly. As a free-tier user I expected to wait days/weeks.
I would definitely be open to trying this because it seems to have significantly more features than Plausible, but then I'd lose all my historical stats. The downside of any analytics is this kind of trapped in situation.
There is no alternative to Google Analytics, because people use Google Analytics (or whatever it is called now: Webmaster Tools?) specifically to find out how Google is interacting with their site.
Nobody cares how Umami interacts with their site, because nobody's heard of it.
I'm always curious why people pick project names the way they do.
I'm seeing a trend of slapping a random simple Japanese word to a new project without explanation. Maybe it doesn't need one but at least it would make the name more memorable to me.
Unique names are more easily googleable, and most unique English words are already taken. So, I think it makes sense to consider other languages. Japanese also has simple phonetics (open syllables), making it generally easy to pronounce—unlike, say, Slavic languages.
If this was my aim, I would be choosing a word that doesnt already exist. Thats far more likely to be unique and easily googleable than a popular and trendy existing name of something.
Or at least change the spelling to accomplish the same goal. Whats wrong with Oomami or Umaami to make it easily identifiable on the web?
Exactly. And with generic words you have little recourse if some bigger fish in another market makes a more popular product with the same name and ends up squeezing you out of search results unless users explitly search for HipWord + ProductType at which point you might as well have used a longer but unique name.
There's Czkawka, Polish for "hiccup" application that helps find duplicates [1], dev behind it also created files renamer Szyszka, again "cone" in Polish [2]. Backup solution Kopia [3] mean "copy" but also "lance" and "spear" hence the pointy thing in its logo - if I recall correctly one of devs is Polish.
That's all what I can remember. I wouldn't count KDE apps that do sound Polish or Slavic just because they "had" to replace initial letter with K to keep the leading theme.
Then people behind MATE desktop on the other hand named apps in their project using Spanish words, e.g. file manager Caja - "box" or "case", documents reader Atril - "lectern" or "music stand"
As for Japanese words usage, it's still the outcome of anime&manga wave that bloomed in the end of 90s. What I find surprising is that nothing comparable happen when k-pop and k-dramas rise to popularity - there's a significant fascination of South Korean culture but not as intensive that would show interest in using vocabulary in the West as that happens with Japan. Perhaps mukbang, "eating broadcast" is the only exception.
Now I'm tempted to name a future project "Strč prst skrz krk", possibly without space to form a single word.
By the way I have a single Japanese word named project, but it's related to the language, and that name has two other connections with the project content.
I think it's the <current year> meme. Just how for a while every JavaScript library had to end in .js, and before that when every electronics manufacturer put an i in front of product names until Apple stopped them. Another example is libraries being described as "blazing fast" or "minimal".
It's easier to google something like "CalendarJS" rather than just "Calendar," which could give you anything but the JS library you're looking for. I think it’s simply practical.
You can find a nice list of privacy-respecting analytics tools on European Alternatives [0], including mine, Pirsch [1].
I've been in this space for ~3 1/2 years, so if you have any questions, please let me know :)
[0] https://european-alternatives.eu/category/web-analytics-serv...
[1] https://pirsch.io
Not related to you, but from a description in the first link, in the description for Plausible:
> Because it does not use cookies their is no need to show cookie banner for this service.
This is IMO a rather fundamental misunderstanding of the current situation.
I'd be hesitant to using a product from someone who I think have misunderstood completely what the rules are about. (Again, IMO and also IANAL but I have followed GDPR more closely than most people.)
GDPR is about collection information, as far as I can see, the technical detailsbof how you do it doesn't matter. It could be pure magic and would still be illegal.
I've actually had this discussion with Plausible directly back in 2022[1], and more recently with the lawyer they had write a blog post[2] on the topic. I wrote an article on it, that was recently discussed here on HN [3].
The response from Plausible is essentially "we've checked with legal council, and stand by the statement". The conversation with the lawyer started out well, but he stopped responding when I asked about the ePD, not GDPR.
There generally seems to be a lot of confusion, even in legal circles, about what ePD requires informed consent for. Many think that only PII requires consent, or think that anonymization bypasses it. That amount of confusion makes it very easy for a layman (e.g. Plausible) to find _someone_ willing to back up their viewpoint.
The EDPB released a guideline in 2023 that explicitly states that what Plausible et al. are doing is covered by the ePD's consent requirement, but that's a little too late: the implementations in member countries already differs massively on whether it's covered[4].
1: https://github.com/plausible/analytics/discussions/1963 2: https://plausible.io/blog/legal-assessment-gdpr-eprivacy 3: https://news.ycombinator.com/item?id=42792485 4: https://matomo.org/faq/general/eprivacy-directive-national-i...
> There generally seems to be a lot of confusion, even in legal circles, about what ePD requires informed consent for.
That seems to be true, going by this comment section and the other ones I've seen.
It's hard to get a non-hyperbolic answer to the question: if everyone is so confused, what's the real-world consequence of best-effort implementation?
Some would say it's the ultimate responsibility of the app owner to understand the law, but how much further can you go than hiring a lawyer?
If more diligence needed to be done than that none of us would get anything built, we'd all just be running around researching the laws around these dumb popups.
What are the real-world consequences of making a mistake here? What kind of boundary would you have to trip over to actually get the authorities to prosecute you for not having a consent popup or doing it badly?
That is unfortunate, and seems to be similar to ADA compliance, as far as what is truly compliant and what is not. It seems like it is up to the courts to decide (speaking as an American, I know GDPR is a European law). I try to do as much as possible to keep up to date with ADA compliance and best practices, but when it comes to tooling around scanning for non-compliance, there seems to be differences. I believe that showing that you made an effort to comply is usually enough to avoid a lawsuit, but it would be nice if things like this were spelled out more clearly for those that need to implement these features.
I have recently gone through a conversation with a client that has been told in NY state (in the US) that something similar to GDPR is coming for those that deal with PII. Both the client and the agency I work for have added various scripts to the website for dynamic forms, tracking (Google Analytics), and newsletter functionality. It's at a point where everything that is 3rd party has to be discovered first, then seeing if there is the ability to anonymize everything (either by default, or with a user consent dialog). Even with current laws, it seems intentional to keep things vague.
Agreed. The company I work for has fought off two "ADA trolls" in the past ~3 years. I'm fully behind accessibility, and we design/develop our website specifically to conform with best-practice; I get, and generally accept, that civil remedies are (currently) the only way to enforce any kind of compliance. I nevertheless call the lawyers targeting us trolls, because their technical analysis was beyond incompetent, and their understanding of accessibility issues woefully out of date. It cost a few days of my + developer time, and I don't know how much lawyer-time, to make them go away.
We (I'm in the US) badly need clarifying regulation. Until then, compliance will mainly be about preventing yourself from being low-hanging fruit for opportunistic litigation - which, to be clear, can generate productive results, but is clearly inefficient.
It is not entirely clear who wrote these descriptions. Maybe it was not the vendor. At least their website https://plausible.io/ has a much better wording.
Correct, it's not so much about Cookies, but how data is collected and what is stored.
We have done a privacy risk analysis with an external lawyer and data protection officer, and concluded that Pirsch is in line with GDPR as we do not collect nor store personal identifiable information (PII). Processing stuff like IP addresses for example is legal as long as they are not stored and only cached for a reasonable amount of time (a few milliseconds in our case).
If you're interested, we have extensive documentation on this. You can reach out to support@pirsch.io to get it :)
If anyone is interested in doing something similar. This did cost us about 8,000 € in Germany.
I guess because you store their fingerprint (for uniques) only 24 hours, it is ok?
This also factors in, yes. If we would store it indefinitely, there is the risk of profiling (estimating who someone is by their behaviour).
> This did cost us about 8,000 € in Germany.
The apparently extensive legal assessment you just described costed just 8'000 euro?
I am sorry but that had to be some hasty review at best. Do you take the full legal risk in case any of your customers would be found in violation of privacy laws because of using your service?
For reference, with similar hourly rates as Germany, reviewing a standard apartment-purchase contract cost me ~3500 euro.
We had someone with a lot of experience in this field working for very large German corporations and got a discount/startup bonus. I wouldn't call it cheap.
Imagine starting a business in Germany. How are you suppose to pay 30-50k for legal questions before selling anything?
Analytics and other forms of tracking are not required to do do business. Don't try to skirt the law and you won't have as many legal questions to answer.
You need consent for (not functionally necessary) cookies because of the ePrivacy Directive (the "cookie law"). Additionally, you also need consent for processing, storing or sharing personally identifying information (PII) because of the GDPR. Usually you do both in the same consent popup.
Plausible doesn't store visitor's IPs or any other PII, and doesn't set any cookies. The reasoning given in the quoted paragraph is incomplete, but the result is correct. You only need to mention them in your privacy policy, they don't require any opt-in popups
PII isn’t a concept in GDPR. GDPR talks about personal data, which on its own might not be identifying, but which in combination with other personal data can successfully identify a person.
Regardless of whether they store it Plausible is exposed to the visitors IP address though isn't it?
At the end of the day it comes down to enforcement. If the rules make no sense, and they can't be enforced. They might as well not exist.
I'm curious: running a static website with no JS-based analytics whatsoever — only Apache logs in standard format (so including IP address and user agent string) — does GDPR require consent banners in this case? If so, doesn't essentially every website require consent banners due to the way websites work?
GDPR does not require a consent banner. If you want to process the user's personal data outside what is strictly necessary, you need permission. One way to get that permission is for the user to specifically consent to it. It does not have to be a banner. (In fact, many banners out there are probably not enough for informed consent anyway, as they provide no information about what data is collected or any reasonable way to opt out.)
Personally identifiable information has nothing to do with javascript, or analytics. Do you have GET requests with parameters containing enough to identify a specific individual? Then your logs are sensitive and you must have a valid contract, informed consent, or provide some important service where this information is necessary.
There are gray areas which can make this difficult, but you the basic idea is enough information to identify an individual. A basic website where you log that IP address A viewed home.html is not enough. The knowledge that a 55 year old woman with particular name on a particular street address has an interest in photograhy and shoe size 9 probably is. The line is somewhere in between.
If I install the Apache web server and accidentally expose the machine to the internet, am I violating GDPR by not having a cookie banner on the "Apache Default Page"?
Probably not.
But of you can still find a way to identify users from server logs, then probably yes.
GDPR is about collecting personally identifiable information, which is distinct from aggregate data that you can't trace back to the individua. Recital 26:
> The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.
So details definitely matter. Some self-hosted analytics do this by getting rid of the last octet of the IP address, though I doubt that's been tested in courts.
If you can figure how many unique visitors your have, you have a problem. That must somehow fingerprint you.
I posted a quotation straight from the recital of the GDPR that says anonymised data does not matter. I even gave a reference that you can look up. The recital even ends with this:
> This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.
There is no ambiguity here, aggregate data is completely fine as long as I can't trace it back to you with a reasonable amount of effort.
A DPO would disagree with you depending on the circumstance; if you know a user is unique then you have a fingerprint; if you keep that fingerprint forever, when the user comes back to the site, it's trivial to know it is that user.
Anonim jwt guest literally did this same thing noo??? I mean if you just track anonim data
what I mean is you can track unique visitor of your app without privacy breach because you use anonim data
Yeah, not using cookies is irrelevant if you use other means to track user. Also people like to think they need to show the "cookie banner" for all cookies regardless of how they are used.
Is Pirsch a fork of Plausible? It looks nearly identical.
No, they are comparable, but it's an independent tool. When we started, Plausible wasn't as big as it now is. We also had a focus on deeper integrations via API from the get go, a nicer dashboard, and a few other minor details.
I basically started this for my personal use as a library for Go, which it still is:
https://marvinblum.de/blog/server-side-tracking-without-cook...
Pricing page looks completely same with Plausible. But prices is less which is good
Funny enough, they seem to have "copied" our structure. I remember when it basically was just a slider, without tiers.
Yes because I remember I suggested to them their pricing structure were not simple. Although I am not sure who started the slider pricing UX.
how do you calculate the session duration? is it the delta between two page hits or similar events?
i tried a couple of the smaller analytics tools, like plausible, simpleanalytics, umami etc... and one thing that i always disliked was the way the session duration was calculated - i have a lot of longer articles where the visitor stays for a long time and then leaves. most of these tools will count that as a bounce, as there is no two hits to calculate the delta between. but for me it is a very important metric to get accurate numbers on, which is impossible with that implementation for sites like mine (very few but long page visits, not a lot of navigation between pages).
do you handle this the same way? that would be a feature i'd be willing to switch my current tool out for.
Yeah, we also use the delta. However, you can send a custom event on close to update the session duration. The session won't be counted as bounced in our system then and the time is updated.
https://docs.pirsch.io/advanced/events
these are good list however very few of them offer for Apps like mobile,dekstop etc
For Pirsch, we have a PWA you can install right from your mobile browser :)
Yeah but its PWA, that's the problem
What's wrong with a PWA?
They simply don't work as well as non-web apps. People continue to insist that they do, but from my experience, they just don't have the same smoothness as a native app to show that it's not a web app.
sure for something you're spending hours on like instagram. for my business data analytics, I don't care. If I'm doing any serious work I'm on laptop anyway, mobile is just for casual checks
a native mobile app is a gigantic time, productivity, and cash investment. if a business can get most of the value from a PWA, they will be far better off investing that time and innovation into other parts of their business than building a native app for the "smoothness"
There are lots of ways to make it cross platform pretty easily if you plan to do so from the beginning, such as React Native and Flutter. Even now, if the site is in React, it is not too difficult to port it all to RN, which also has a web version that is quite similar to React proper. Plus, RN and Flutter have PWA support already too.
Do you have examples of such apps? Generally curious since I would assume that there might be other factors at play that make such apps "not smooth".
Try something simple like Instagram via the browser versus as an app, it's simply smoother on the app. I would have to dig up more examples but IG immediately comes to mind as a recent experience.
I'll have a basic "how is this different than the thing they are copying" please.
I guess you have to sign up to a few, test them on your site, and decide which one to use. In the end, they are all slightly different.
If you would like to self-host or have other specific requirements, you can quickly reduce the list to a couple of options of course.
Does it require cookie and/or gdpr consent from the user to use these privacy analytics tools?
No. You can learn more about it here:
https://docs.pirsch.io/privacy
I just shut down a companion app for a game I'd reverse engineered — developed over the last 2-3 months. The companion app, among other things:
- Generated insights — https://bizarre.gg/meta
- Show detailed interactive gameplay logs (from "Umami" analytic events) — https://bizarre.gg/runs/00493ccf-5b96-523c-beb4-06e8154cc158
Thread w/ development overview: https://news.ycombinator.com/item?id=43080066
I used Umami and mention it in the video. Admittedly, it was a mistake for my use case. I had to heavily modify Umami due to lack of features and performance issues. There are a also lot of bugs in the project which are immediately revealed simply by enabling TypeScript strict flags, and some more linting rules. Granted, I was not really using Umami exactly as intended. I do think it's great this project exists, and whilst I had to heavily modify it for my use case, I did at least help the upstream project diagnose one issue: https://github.com/umami-software/umami/pull/2946#issuecomme...
The company that made the game (Tempo Games) sent you a cease and desist?
That's not very friendly of them.
They also gave him a job offer, but yes, Tempo has been pretty aggressive in trying to keep their game from getting "solved" by third party tooling collecting analytics on the game.
An honest job offer or a "please travel to a jurisdiction where we can more thoroughly fuck you over" offer?
Been using it for my personal website for over a year as a self-hosted solution. Not great if you want just to set it up and forget about it. There are breaking changes every now and then on every part, DB and the FE. So at some point it just broke for me and stopped showing relevant data. I ended up switching to piratepx as it was enough for me to see if there were any visits.
Same here. My self-hosted instance is broken right now and I've not been able to find time to fix it. The pace of change was easy to keep up with when it was just 1 guy.
Now it appears they have built an entire team and raised some VC to build out their SaaS.
>Now it appears they have built an entire team and raised some VC to build out their SaaS.
Is that the case? Just FYI there was a scam claiming to be them and raising funds about Web 3 and Crypto.
Can vouch for this as well. The API has breaking changes all the time and there's no notice whatsoever. We'll transition away soon.
Related, shoutout to Goatcounter - https://www.goatcounter.com/
It's an incredibly bare-bones analytics tracker, but it's free and cloud-hosted which were the two things I was most looking for in an alternative to GA.
I run a website that gets about ~300k pageviews/month. Vercel was eating my wallet alive with their analytics offering. All I wanted with my tracker was to feel motivated by knowing that traffic was going up and to the right. I didn't want to pay hundreds a month for that and I didn't want to manage my own server just to have analytics. Goatcounter addressed my needs well!
I just stood up a new toy project (excepted traffic is next to nothing but I still want to be able to tell) and was just thinking I needed something like this. Thanks!
Something I've noticed about all these privacy respecting analytics apps is they all seem to be using a similar UI, in fact the only one I know that uses a different UI is Motamo.
I wonder why that is? My suspicion is this is layout is becoming a "standard UI" for analytics software. Often times I see companies in the same space largely mimic each others UI's. Things like the "Find Care" option in most healthcare sites looks largely the same. Same thing goes for LLM frontends and time tracking software. It just seems that each team has individually come to the same conclusion about what the "best UI" is for a given task.
Umami Demo - https://eu.umami.is/share/LGazGOecbDtaIwDr/umami.is
Simple Analytics Demo - https://dashboard.simpleanalytics.com/simpleanalytics.com
Fathom Analytics Demo - https://app.usefathom.com/share/deasaicp/hilarious+platypus?...
To me it's just a case of if it ain't broke don't fix it. I don't agree that "each team has individually come to the same conclusion". You can develop a product way quicker by cloning.
In this case the value prop is in the open source, self host, privacy first. Why try to innovate on the UI?
in my exp, there's only a few good UI libraries out there which is part of it, but I think generally the "standard UI" is a thing much like it happened to architecture and then we got the reset of brutalist / modernism, to highly generalize. Just thinking aloud, choosing between novel, unknown, and interesting UI/UX vs proven, reliable, and commonplace (read: boring) is always a toss up, coming down to the audience. Biz interests usually tend to the latter, which imo is good because you want people using the product, not thinking about how to use the product.
Mine also has a different UI, not necessarily better though, but the feature set is different: https://dashboard.uxwizz.com/server/demoLogin.php
https://pico.sh/analytics for a tabular view of analytics in a TUI
I know it's dumb, but when a library requires me to install Yarn to install it, I think so much less of it.
I just hate the idea that I have to install an entire package manager simply to use some Node.js code, when NPM almost certainly could have done the job.
It's not a library, it's an application. You're quoting a section called "Installing from Source". Obviously that'll require you to use the build tools the developers happened to have chosen.
There's also a Docker option if you don't want to do that.
If anything requires me to install Docker, Yarn, or even Node.js, I immediately lose interest. Real programmers use butterflies!
You can pry my JS ecosystem from my cold (& well manicured) dead hands.
Seems like it’s because they’re using the resolutions feature to override a dependency resolution. The alternative would be forking at least 1 package, all the way down to the dependency, to fix the version.
There’s a reason there’s 3 popular package managers for Node that aren’t NPM. Yes, part of that is the culture/ecosystem, but not entirely.
The npm supports overrides field in package.json since like a few years already.
True. Either way, Umami's been using `yarn` since 2020, before that release of NPM (although for what reason at that time, I don't know).
Being bad thereby creating desirable competition has lasting effects. We could get into when/why/what each thing supports all day, but it's not worth it.
Speaking of what's supported nowadays, installing other package managers is a corepack call away -- literally a whole other feature built into Node.js because NPM is/was/etc subpar. It's experimental, but this is all to say: it doesn't surprise me in the slightest that a project might use something that isn't NPM, and I actively expect it when picking up other's projects.
Related:
https://news.ycombinator.com/item?id=24198329 aug-2020 227 comments
https://news.ycombinator.com/item?id=27181622 may-2021 42 comments
https://news.ycombinator.com/item?id=31284853 may-2022 42 comment
https://news.ycombinator.com/item?id=24184773 aug-2020 9 comments
https://news.ycombinator.com/item?id=24422333 sept-2020 3 comments
Something that specifically has documentation on bypassing anti-tracking security software[0] is not "privacy-focused". Your users have indicated that they do not want you to track them and have gone out of their way to stop you from doing so. Attempting to bypass that is specifically taking steps to undermine their privacy when you absolutely know they do not want that.
A "privacy-focused" solution (not that software that's specifically made for spying can be "privacy-focused". Let's call a spade a spade: it's spyware) would at least use standard endpoints to make it easy for users to opt-out by blocking those endpoints. In this way, GA is actually more privacy friendly.
"Some lists can be overly agressive" is also a bad attempt at gaslighting. Your software watches the way I browse, including tracking purely client-side events and outbound links (see "journeys"). You attempt to track things like device characteristics and what operating system I use. That's creepy and voyeuristic to me, and is exactly what spyware blockers are for.
Actual privacy-friendly analytics looks more like the Steam hardware & software survey where they ask if you'd like to tell them these things, and show you exactly what they are going to collect.
[0] https://umami.is/docs/bypass-ad-blockers
Absolutely, privacy preserving analytics is an oxymoron. And besides survers without informed consent the best solution is to simply track less. Think hard about what information you actually need and will even look at for more than once.
We recently migrated from Matomo to Umami at work after hitting scaling issues with Matomo, even after implementing various MySQL optimizations and archiving reports through cron at a decent interval. Even the most basic tasks like loading the dashboard was painfully slow (before you comment on the resource usage, our instances were quite huge and the load was alright).
Surprisingly, Umami has been handling our traffic volume without breaking a sweat on much smaller instances. I suspect PostgreSQL's superior handling of concurrent writes plays a big role here compared to MySQL/MariaDB. Except for the team/user management, everything feels much nicer on Umami.
Shameless plug: As part of the migration, I also took the opportunity to learn some Rust by writing a small utility that uses the Umami API to generate daily/weekly analytics reports and sends them via email[1]. Pretty happy with how it turned out, though I'm still learning Rust so any feedback or suggestions for improvement are welcome!
[1]: https://github.com/Thunderbottom/umami-alerts
I am also curious about the traffic amount and server specs.
In my experience, MySQL still runs very well until you have 10-20m rows (on a single machine, like 8vCPU and 32GB RAM), after it gets trickier to get instant responses.
We had huge servers, with the database and the application itself running on separate instances. IIRC, we had a 32 core, 64GB instance just for the DB itself which we doubled when we started adding more sites to our configuration and it still wasn’t enough. As for the numbers, our site(s) get heavy traffic everyday, in millions daily, since we are a stock broker.
You’re right about MySQL performing alright for 10-20m rows, but from our perspective those numbers are not that big for a company this size.
> our site(s) get heavy traffic everyday, in millions daily
Yeah, it's hard to run aggregate queries on MySQL once you are talking about hundreds of millions of rows, or billions. Even though, if the server has modern CPU, enough RAM to store the entire DB and NVMe storage, it's still okish with the right indexes and if the queries are optimized.
Thanks for sharing!
Could you describe a bit the load and the server/db specs? I’m using Plaisible right now and I wonder how it would handle with similar specs
We had separate database and app instances, the DB instance had 32 cores and 64GB memory, which we doubled to keep up with our requirements. We have tens of millions of visits daily, and our database was close to ~300GB within the first few months.
For plausible I believe that since it runs on Postgres, scaling should not be a problem as long as you scale the resources with it.
For my platform, I found those optimization tips to work quite well: https://docs.uxwizz.com/installation/optimization-tips/mysql...
In all honesty, these optimizations are quite basic. We already used MariaDB instead of MySQL itself. Other things listed in the post are something that we have standard across all our databases, well, except for deleting the data to speed up the database.
Have you also considered Percona MySQL server? I think they say they have the best performance (but I haven't tested their implementation yet).
No, unfortunately our company’s and external regulatory compliance policies require us to host all data within the country itself, alongside it being required to be run on an infrastructure that is easily auditable. So as a policy within the company, all our internal services are open source and self hosted.
[dead]
Umami also support MySQL as well and I do t remember having much different between Postgres or MySQl as backend.
Things would hopefully be even better once clickhouse support lands.
I have been using (and self-hosting) Umami for 3 websites for the better of a year. While good for my use-case, which is just having some 'fun' insights of how many visits my pages get and where that traffic is coming from, it's mostly aimed at my profile I reckon. Would never use it for a business purpose. Also the UI is somewhat immature still.
So all in all: total fan, otherwise I wouldn't be using it, but it's fairly limited in what it can do.
I really like the simplicity of the ui. I’m curious what makes it immature vs mature in your opinion?
There are options missing from the admin panel or the dashboards that intuitively seem basic. Some examples: - excluding certain data from your report (such as localhost visits) - setting a default time range - on the 'overview' of all your domains, setting a time range that applies to all and not just one domain individually (this particularly felt really counterintuitive)
There is more I'd come up with if I actually pulled it up, but the overall throughline is that it just feels 'too basic' at this moment. Especially for something that goes beyond tracking visits on your personal blog and/or hobby website.
You can exclude your own visits (like localhost visits) by manually setting a localStorage entry on the browser you don't want to report statistics.
https://umami.is/docs/exclude-my-own-visits
>This setting applies per website, so you will need to do this for each website you want to be excluded from.
Would it not be better to just have a blacklist it could ignore localhost and whatnot?
Thank you excluding localhost is a feature I didn’t know I needed till your comment. Makes total sense.
Have you tried the similar tools as well like matomo?
I have checked them out and decided not to go with it since it was overkill for my situation. Like I said, I don't need a whole lot more than visits, geo, referrer, and track events[1] which work really nicely. If I ever do need a GA4 alternative for 'serious' purposes though, I might consider Matomo as it seems a lot more complete.
[1] https://umami.is/docs/track-events
You can see the influx of presumably hackernews visitors (through GitHub/Google) on the demo instance that's linked from their website :)
https://umami.is/ https://eu.umami.is/share/LGazGOecbDtaIwDr/umami.is
As someone who doesn't really often look at these sort of charts... traffic coming via chatGPT being higher than Bing was quite the surprise to me. Makes of course total sense, but still astonishing to see the actual numbers in comparison.
Is the demo using real data?
And if it is, is there that notable an influx?
Zoom out to 90 days, and then today doesn’t seem much different from any other day.
Yes, it's real data, and by now the influx is apparent in the 90 day view too!
No, umami is a basic taste.
Why can't people be more creative with finding good names? Salt? X? Yes, Apple is a terrible name too, but at least it was historically called Apple Computers and only changed once they had a huge brand already.
> Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
https://news.ycombinator.com/newsguidelines.html
Urchin, the software name of the precursor to GA, has an umami flavor. I find the naming quite clever and easy to remember.
Are you remembering the name because you’re remembering the tool or just the flavor? I don’t think hijacking a well known term makes it memorable.
I recognize the reference because of the tool. I remember the name because it's simple. The flavor itself is irrelevant in my thought process.
Urchin - https://en.wikipedia.org/wiki/Urchin_(software) Urchin Software Corp. was acquired by Google in April 2005, forming Google Analytics
yeah, funky how 20 years later the UTM (Urchin Tracking Module) query param is still the defacto standard tracking tag.
There was debate on the team back in ~2009 over changing the query params -- when you estimate how much bandwidth and disk is consumed per day at the scale of Analytics, just for the difference of a character or two on a handful of query params -- even well over a decade ago, it really was astounding.
The migration would never be total, though, because more than a few users had copied the analytics script to their CDN or were in some way depending on a local fork of it. Some backwards compatibility would need to be maintained, perhaps for as long as the product exists. For this and other reasons, changing the params (and supporting both, indefinitely) wasn't pursued.
I have been using Umami and I can't thank the people behind this project enough. It just works !! Not just it works well, they pull no tricks when it comes to self deployment and running it on your own. Many open source has a "run your own" option, but they would also want you to not run and use the managed version. I understand that part that they want to make money. But Umami, just works, you will never realize that there is an option where you need to pay $$. But if I make enough from my work, I am sure becoming a paid subscriber for their managed analytics.
Providing some context which I think is missing. It originally started as shown HN a little over 4 years ago [1]. It was a one man (Maco )side project.
For a long time, may be at least 2-3 years it was simply an open source, use as you please MIT software with no hosting offering. I remember when it launch the first thing I notice was the graphics design was so much better than other offerings. While I was interested at the time, I ultimately went with plausible. Mostly because it seems to have a sustainable backing, and another major reason was the usage of ClickHouse. If you want best resources usage / scale / performance analytics I don't think there are many other choices. A lot of other solutions, including pleasurable settled on clickhouse as well.
I have recently went to check on them again and looks like things have progressed a lot. They have hosting offering and they are also working on Clickhouse integration.
Between Urchin, Reinvigorated, and another one I can remember its name, it's been a long time I am excited about analytics. I just hope if Mcao read this. Congratz and well done .
[1] https://news.ycombinator.com/item?id=24184773
I've used umami for my projects with low volume and it has worked great. Set it up on it's own instance connected to a postgresdb and it's been great for the past year.
Also there is a nice python library for calling the functions directly: https://github.com/mikeckennedy/umami-python
I do wish it would track bot traffic though. My site of 10m pages is heavily crawled by bots and I wish I could keep track of them a bit more.
I think you're looking for the `DISABLE_BOT_CHECK` env variable
See https://umami.is/docs/environment-variables#:~:text=DISABLE_...
What an oxymoron.
There needs to be a convincing explanation of how google style analytics can be privacy focused, otherwise I disregard this as a clone with feel good branding
It stores no personal data, requires no cookies, and ships nothing to Google. You host it yourself.
1) GA doesn't store personal data either
2) who gives a shit, genuinely who cares about cookies?
3) so it's privacy oriented in that the dev doesn't send their data to google, but users send their data to someone anyways? And why would sending data to a self hosting rando be safer than sending it to google from a user and security perspective?
Google’s entire business is your personal data. I’d much rather my browsing info be sent to some small website for local analysis for their own purposes vs. hoovered up by the largest data broker in the world and aggregated with everything else about me.
Did you read the recent article about how reCAPTCHA is used for tracking you across the internet? I have no reason to believe GA isn’t similar. And just think about how much info they have on you from reading all of your emails.
Stop smoking mids and why don't you see what umami actually does?
It is privacy oriented from the perspective of the company, not the individual. I think there is some value in that. Although it makes it no more likely to be secure or private for the individual end user visiting the site though.
Privacy oriented from the perspective of the company is at least more privacy oriented from the perspective of the user. A company harvesting my data for analytics is more private than two companies harvesting my data for analytics.
If I'm going to a site, I'm willingly sharing some of my personal data with that site. I'm not implicitly consenting to third parties harvesting my data.
It doesn't say it provides google style analytics.
Does the no-cookie thing mean you can't count e.g. unique/returning visitors? Or is there any privacy-friendly way of doing that?
This almost always resolves to "not using actual cookies, but exactly the same privacy and security paradigm as cookies"
I mean you can probably do something clever with this like rainbow tables on fingerprints or something like that which is more probabilistic so you never store individual fingerprints. Would be interesting to know what the solution is.
Sure but any probabilistic approach is either relatively inaccurate reduces it's usefulness for this use case, or accurate which raises the same identifiability concerns cookies would introduce. I guess my point was; this has been thought about already. (Pseudo) anonymized attribution is a bit of a solved problem and you can do it with or without cookies. That's mostly a implementation detail rather than a distinguishing feature.
> (Pseudo) anonymized attribution is a bit of a solved problem and you can do it with or without cookies.
How is it typically done without cookies then?
> any probabilistic approach is either relatively inaccurate reduces it's usefulness for this use case, or accurate which raises the same identifiability concerns cookies would introduce
How so? Even if it's accurate you wouldn't be storing anything the information (random id or fingerprint) for the individual user, so you would only be able to answer with reasonable certainty whether you saw the user before or not. You can't identify anyone from that (other than identify them as a new vs returning user) so there is no identifiability concern, unless of course one thinks that constitutes a concern in itself which I don't think the GDPR does.
Maybe we're disconnecting. Cookies are just a standardised way to communicate a small key/value set between client/browser and server through HTTP headers. It's not inherently (in)secure, sensitive, etc. There are zero things you can do with cookies that you cannot do without and there are no inherent differences in security, they're just very convenient if you're in HTTP world.
And yes what you said is exactly right; you're allowed to fingerprint a unique user and track data with that fingerprint as the sole unique identifier without any PII legislation (GDPR, CCPA, etc.) compliance issues. You just cannot store any information that allows linking PII data to that fingerprint in either direction. In other words, attribution to a random UUID that just happens to represent an anonymous user is not an issue.
Circling back to the original comment; there is no (good) argument against cookies if you're basically doing exactly what cookies are doing. Umami using it as a USP is, at best, a little odd.
> you're allowed to fingerprint a unique user and track data with that fingerprint as the sole unique identifier without any PII legislation (GDPR, CCPA, etc.) compliance issues.
That is not true. E.g.
1. https://ico.org.uk/about-the-ico/media-centre/news-and-blogs...
2. https://ico.org.uk/for-organisations/direct-marketing-and-pr... specifically https://ico.org.uk/for-organisations/direct-marketing-and-pr...
This is for the UK, I am not up to date with other European regulators.
fingerprint a unique user
I don't think this is correct, or at the least it's unfortunately phrased. If your fingerprint is so specific that it can distinguish unique users, it is covered under GDPR compliance. I don't know too much about the CCPA so not sure if it's the same there.
Yes, you are allowed to collect device statistics such as form factor, viewport size etc. But if you can distinguish between two different users with identical devices accessing your site at the same time, under GDPR you have an obligation to inform [14]. And if you can recognize a returning user across sessions, you also need consent.
[14] https://gdpr-info.eu/art-14-gdpr/
If the random user ID is truly anonymous (so, cannot be linked back to an identifiable person even with other data you have), it is not personal data under GDPR and no obligation to inform or consent is needed. If the data processor stores any information that makes PII attribution possible then, and only then, does it fall under GDPR, CCPA, etc. That random ID being persisted on the device allowing for subsequent attribution is still not PII sensitive unless/until the aforementioned identifiability barrier is breached. This is exactly why prominent analytics platforms (Plausible, Matoma, Mixpanel if configured correctly, etc) all offer data hygiene barriers.
I suspect what's happening here is that the word "user" is making things ambiguous here. It was meant in the context of attributable session, not as the data subject as per GDPR language for example.
As far as I'm aware, any online identifier, even a random number created by me, falls under the GDPR. This answer lays it out pretty well
https://law.stackexchange.com/questions/82133/are-auto-gener...
I don't know about Umami, Plausible describes how they solved this here: https://plausible.io/data-policy, under the section "How we count unique users without cookies"
TL;DR: They derive an identifier from IP address and User Agent using an hash, allowing them to have a tracking identifier without storing Personal identifiers (the IP address)
This seems like a worse approach to me than a cookie.
My IP and UA don’t change, pretty much ever.
I can delete a cookie anytime I want.
They salt the values and compute id = hash(daily_salt + IP + UA). Then they remove those every 24 hours. I think it sounds like a perfectly reasonable solution.
If they remove those every 24 hours, then doesn’t that mean if I made two visits, separated by more than 24 hrs, it would count as 2 unique visits rather than 1?
Yes. I am still not aware how to track returning visitors if while still staying within some privacy framework. Unfortunately on HN the default answer is to say not to track at all.
You mean fortunately? Because that's the only correct answer.
Creating a fingerprint like that will become a personal identifier under the various privacy rules
I was under the impression that this is the exact kind of thing that violates the GDPR. That is.. processing an identifier (IP address) to do something more (track user actions across multiple requests) than what is required (route traffic to the server).
A user resettable cookie _is_ the privacy friendly way of doing it.
I had a lot of problems with getting Umami to work with Vercel, in the end I abandoned it and used Vercel Analytics.
First Umami was being blocked by all ad blockers and it caused my web app to crash, then I had to use some workaround which allowed it to work with ad blockers but stopped my Express app from sending responses back to the frontend.
It irritated me so much, because there is not enough documentation for it, although I did find multiple people reporting something similar like me. I just called quits on it, analytics shouldn't be that complex to setup if you are not self-hosting, the Vercel one is serving me okayish for now.
If you're interested, you can find a list of similar, open-source alternatives (including Umami) on https://openalternative.co/categories/website-analytics
Glad to see Umami on HN. I've been using it for the past year or so using their docker-compose setup. Really liking it. A big :+1: from me.
My recent experience with wanting analytics for an enterprise application was that for my limited needs, it was easier to roll my own than to deal with evaluating all the options and integrating with another service. There are already a ton of privacy focused alternatives to google analytics. So many such that finding one that serves a niche is practically ungoogleable.
Why is it developed in JavaScript? Wouldn't it be better for the user response to make in a more performant language?
I like umami. What stood out to me is their customer support. I had some sort of issue where my free limit was always maxed out. They responded and resolved it very quickly. As a free-tier user I expected to wait days/weeks.
Would recommend.
I would definitely be open to trying this because it seems to have significantly more features than Plausible, but then I'd lose all my historical stats. The downside of any analytics is this kind of trapped in situation.
The standard move is for incumbents to offer migration features from their competitors.
If the competitors allow exporting your data.
Wanted to also give this product a shout out ($1/month): https://onedollarstats.com
drizzle.team is alway delivering.
Why pay for analytics when I can set up Umami with a single command ?
How well does Umami deal with detecting (and ignoring) bots?
Bot traffic is significant and I've noticed different vendors have varying degrees of success filtering out bots versus humans.
Bot detection is enabled automatically.
I really love Umami but the user / team management is incredibly weird and I still haven’t figured it out. Anyone else finding the same?
Too bad it's in node
Easy to deploy with dokploy
There is no alternative to Google Analytics, because people use Google Analytics (or whatever it is called now: Webmaster Tools?) specifically to find out how Google is interacting with their site.
Nobody cares how Umami interacts with their site, because nobody's heard of it.
[dead]
I'm always curious why people pick project names the way they do.
I'm seeing a trend of slapping a random simple Japanese word to a new project without explanation. Maybe it doesn't need one but at least it would make the name more memorable to me.
Unique names are more easily googleable, and most unique English words are already taken. So, I think it makes sense to consider other languages. Japanese also has simple phonetics (open syllables), making it generally easy to pronounce—unlike, say, Slavic languages.
If this was my aim, I would be choosing a word that doesnt already exist. Thats far more likely to be unique and easily googleable than a popular and trendy existing name of something.
Or at least change the spelling to accomplish the same goal. Whats wrong with Oomami or Umaami to make it easily identifiable on the web?
Exactly. And with generic words you have little recourse if some bigger fish in another market makes a more popular product with the same name and ends up squeezing you out of search results unless users explitly search for HipWord + ProductType at which point you might as well have used a longer but unique name.
There's Czkawka, Polish for "hiccup" application that helps find duplicates [1], dev behind it also created files renamer Szyszka, again "cone" in Polish [2]. Backup solution Kopia [3] mean "copy" but also "lance" and "spear" hence the pointy thing in its logo - if I recall correctly one of devs is Polish.
[1] - https://github.com/qarmin/czkawka [2] - https://github.com/qarmin/szyszka [3] - https://github.com/kopia/kopia
That's all what I can remember. I wouldn't count KDE apps that do sound Polish or Slavic just because they "had" to replace initial letter with K to keep the leading theme.
Then people behind MATE desktop on the other hand named apps in their project using Spanish words, e.g. file manager Caja - "box" or "case", documents reader Atril - "lectern" or "music stand"
As for Japanese words usage, it's still the outcome of anime&manga wave that bloomed in the end of 90s. What I find surprising is that nothing comparable happen when k-pop and k-dramas rise to popularity - there's a significant fascination of South Korean culture but not as intensive that would show interest in using vocabulary in the West as that happens with Japan. Perhaps mukbang, "eating broadcast" is the only exception.
Now I'm tempted to name a future project "Strč prst skrz krk", possibly without space to form a single word.
By the way I have a single Japanese word named project, but it's related to the language, and that name has two other connections with the project content.
I think it's the <current year> meme. Just how for a while every JavaScript library had to end in .js, and before that when every electronics manufacturer put an i in front of product names until Apple stopped them. Another example is libraries being described as "blazing fast" or "minimal".
It's basically bankrupted creativity.
It's easier to google something like "CalendarJS" rather than just "Calendar," which could give you anything but the JS library you're looking for. I think it’s simply practical.
So would "JavaScript calendar"
Not really. CalendarJS is a specific library in this case, JavaScript Calendar is all of them.
Yeah, "blazing fast" is a very common description for Rust projects...
Wild guess: it’s a word starting with “u” as a hat tip to the original Urchin (the “u” in utm)
Very wild guess