Not entirely sure of the reason, but the Javascript contexts for the API frontend servers are constantly crashing. Due to the holiday I won’t have the bandwidth to debug the problem until tomorrow evening, so I’ve temporarily taken the APIs down in the meanwhile. I should be able to figure out what’s going wrong tomorrow and probably get everything back up by Sunday.
I’ll post updates to this thread if something goes horribly wrong.
Sorry about the downtime
EDIT: Going to be down until Monday at least; going to add explicit rate-limiting to the API. I’ll have some more details later, still need to run some numbers.
Thanks for the heads up!
Is there any way to determine the API’s current status? Something like https://api.guildwars2.com/v2/status ? That would be really nice.
Does this have any relation to the recent issues with the in-game TP where requests seem to stall and have to be repeated? That has been happening for some weeks now.
One small suggestion. If the service is off, maybe you could send a 503 header service unavailable so we can better handle this case in the future.
I should adjust that. Been turning off the APIs by disabling all endpoints via config; I should add an explicit off switch that sends the proper response.
Thanks for the heads up!
Is there any way to determine the API’s current status? Something like https://api.guildwars2.com/v2/status ? That would be really nice.
The root (https://api.guildwars2.com) kind of works like this? When the v1/v2 endpoints are disabled they’re removed from the list. But using an explicit status code to indicate “the world is on fire” is probably easier on the client side.
Does this have any relation to the recent issues with the in-game TP where requests seem to stall and have to be repeated? That has been happening for some weeks now.
AFAIK the API failure was totally self-contained and shouldn’t have affected other systems.
looks like, at least certain, endpoints have been up and down today. How big is the fire? ie would you appreciate it if people held off work on new features today that would hit the endpoints that seem to be up (or were up last time my services ran)
looks like, at least certain, endpoints have been up and down today. How big is the fire? ie would you appreciate it if people held off work on new features today that would hit the endpoints that seem to be up (or were up last time my services ran)
They should be entirely down, except around 24h ago when I briefly turned them on to get some metrics out.
Regarding the sporiadic up nature, queicherius provided a graph and that correlates strongly with a minor config change I made (to turn the APIs back on for the office IP addresses). I’m not sure why/????/how there’s a correlation there, but I’ve undid it (so the behavior should go back to the solid red bar of “down”.
hmm, here’s the list of times I’ve gotten a response since Saturday morning (hourly checks):
everything fine through 05:09 then nothing. Then sporadic successes:
2017/03/18 23:09:15
2017/03/19 06:09:15
2017/03/19 08:09:15
2017/03/19 10:09:15
2017/03/19 19:14:31 (new server, new time, one day I’ll fix that, but probably not)
I can’t see from here if those all correlate, with that much green I feel unlucky about the big break in hits today
I found that Nginx is great as a rate limiting reverse proxy. Just lookup “rate limiting with nginx” to get started. Extra nice that you can partition the rate limiting. So >5reqs/sec I slowed a little for a couple seconds but >20reqs/sec got throttled to 4/s for a big chunk of time.
Good luck, I feel your pain. This stuff always happens at 2AM on a holiday.
I use Overlay timer for wvw, since the reset no more wvw maps, its out of work. I know that Overlay isnt an Anet product, my question is: is an existing link between the Api shut down and the unworking wvw maps?.
Sry for my poor english.
I use Overlay timer for wvw, since the reset no more wvw maps, its out of work. I know that Overlay isnt an Anet product, my question is: is an existing link between the Api shut down and the unworking wvw maps?.
Sry for my poor english.
Yes the existing links to api are shut down right now. That is why your overlay is not working.
Do some sort of caching, that, when API is down, can still dispense all the resources that have been frozen on take down?
Beyond setting reasonable cache headers that’s really the client’s job. I know I maintain a local cache for my API consumers.
True.
It still bugs me that the API went down the morning after I implemented a caching bit into my script. All I would need right now to continue working on my school assignment is being able to run my script just once. (Fyi, I only need the static endpoints for my tool to function)
Could there please have a rough Eta as to how long the API site will be down for? I understand that if ya give an ETA and things dont turn out the way ya expect it may take longer. Any info would be appreciated.
any info when this will be up? i need my tradepost stuff ;(
I’m trying to get it back up today; I still need to get someone to review my changes to make sure I don’t cascade the failure to other systems if I turn everything back on.
Could there please have a rough Eta as to how long the API site will be down for? I understand that if ya give an ETA and things dont turn out the way ya expect it may take longer. Any info would be appreciated.
ETA was yesterday, except during testing there was a small bug in my implementation that caused it to use the wrong instance of the centralized rate-limiting service (i.e. it used the one that everything else uses instead of the API-specific one). So I punted on the deploy until we’re sure that the changes aren’t going to overwhelm anything.
Fair enough. Thank for the update. I fully understand that in programming things happen that are beyond your control. Id rather wait a bit longer for a good release then have a crappy release. Are you thinking tomorrow hopefully?
#hugops to you and a sympathetic nod re: distsys being hard.
(also, glad to see rate limiting coming. I saw the frequency of requests some sites were making just for my one account and the ops person in me went AAAAAAAAAA)
Okay, the API is tentatively back up with a limit of 60 reqs/minute while I double-check some things.
Will bump that up to the full 600 reqs/min in an hour or so when I’m confident that nothing has caught fire (which might happen, but doesn’t look likely at this point).
^This, very much this. I truly appreciate the hard work you put into the API, and even show it off to IT coworkers when I need to explain to them why the whole world needs well documented APIs for everything. Your efforts are strongly appreciated by all of us geeks in the community.