Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic tags applied like health checks. #1048

Open
fidian opened this issue Jun 19, 2015 · 57 comments
Open

Dynamic tags applied like health checks. #1048

fidian opened this issue Jun 19, 2015 · 57 comments
Labels
theme/service-metadata Anything related to management/tracking of service metadata thinking More time is needed to research by the Consul Contributors

Comments

@fidian
Copy link

fidian commented Jun 19, 2015

In issue #867 I suggested an idea to make tags that depend on the result of scripts, just like health checks.

I run mongo in the cloud with multiple machines all spun up from the same image. On boot they will query for mongodb.service.consul and join the cluster. That all works flawlessly. In being a good Ops person I have a cron job that will kill random machines in my infrastructure at random times. It will eventually hit the mongodb master, the system will hiccup and a slave will be promoted automatically. Life is fantastic.

In comes Legacy Software that must connect directly to the master mongodb instance. I would like to have master.mongodb.service.consul resolve to the one IP of the master in the cluster.

Current solution (runs via cron on all machines):

  1. Get my service definition through API
  2. Check the status of the cluster. This determines if we should or should not have a tag.
  3. Determine if the service definition's tag list needs to be updated.
  4. If an update is required, POST data back to the API.

Ideal solution:

  1. Set up my service definition with dynamic tags.
  2. Write a script that returns the status of the cluster, with an exit code of 0 meaning to apply the tag.
  3. Let consul update itself automatically.

Sample JSON (one static tag, one dynamic tag):

{
    "service": {
        "name": "mongodb",
        "tags": [
            "fault-tolerant",
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

This sort of solution could apply to issue #155 and #867, and possibly other.

@ryanuber
Copy link
Member

Interesting idea. I think the work-around you mentioned is a decent way of doing this, but I'm going to leave this open as a thought ticket for now. Thanks!

@ryanuber ryanuber added the thinking More time is needed to research by the Consul Contributors label Jun 22, 2015
@Kosta-Github
Copy link

@fidian with respect to your statement: "On boot they will query for mongodb.service.consul and join the cluster."

Can you describe this a bit more, since I want to setup something similar for a redis cluster. Do you use some handcrafted script (e.g., via consul-tenplate or the REST API) for querying for mongodb.service.consul to get all registered nodes for that service or are you relying on the DNS mechanism for that? At least one problem with solely relying on the DNS mechanism is, that if the node registeres itself (e.g., with registrator) within the consul cluster before it does the DNS lookup for mongodb.service.consul it might get back its own IP address, which would not be helpful to join the cluster... :-)

@walrusVision
Copy link

This would useful for services like zookeeper which dynamically elects a leader node among themselves every time a node joins or leaves the cluster and the leader has the setting on so that it no longer accepts client connections. Having dynamic tags like this via check would make so I could query consul for the non-leader nodes and not have a client trying to connect to the leader at all.

@fidian
Copy link
Author

fidian commented Jun 23, 2015

@Kosta-Github asked how I manage to auto cluster my mongo instances.

  1. Consul is hooked up through dnsmasq.
  2. Consul is started before mongo.
  3. The health check fails unless mongo reports success and mongo is part of a cluster. This second part is vital - the health check fails until mongo is in a cluster.
  4. The init script for mongo queries DNS for other members in the cluster. This will only report mongo instances that are already in a replica set.
  • If IPs are found, become a slave and connect to the IP that we found.
  • With no IPs, configure as a master and enable the replica set, which then makes the health check pass.

The only snag is that I must start one instance of mongo initially so it will bootstrap the replica set. Once it is running I am able to add and remove instances to my replica set.

@Kosta-Github
Copy link

@fidian thanks for the explanation; just one more question: how does your dnsmasq config look like? :-)

@fidian
Copy link
Author

fidian commented Jun 23, 2015

@Kosta-Github it looks like the following. I'd also answer questions off this issue. Feel free to email me directly at [email protected] so we don't continue to pollute this thread.

server=/consul./127.0.0.1#8600

@igoratencompass
Copy link

+1 for this feature request

@eloycoto
Copy link

+1

4 similar comments
@hugochinchilla
Copy link

+1

@xakraz
Copy link

xakraz commented Aug 20, 2015

+1

@jh409
Copy link

jh409 commented Sep 24, 2015

+1

@adbourne
Copy link

adbourne commented Oct 8, 2015

+1

@memelet
Copy link

memelet commented Nov 28, 2015

This would be very very nice. There are all kinds of things for which clients need to connect the mast expliclty. A dynamic tag would be so elegant. So much better then a bunch of add scripts to tweak tags.

@danielbenzvi
Copy link

+1

1 similar comment
@wyhysj
Copy link

wyhysj commented Dec 8, 2015

+1

@123BLiN
Copy link

123BLiN commented Dec 14, 2015

+1 tag plus script would be very usful to implement custom DNS response logic

@richard-hulm
Copy link

Currently have to run two 'services' for a similar situation, have a "redis" service which includes all nodes in the cluster, then a "redis-master" service

This has the unfortunate side-effect of meaning most of the redis nodes are always 'failing' the health check because theyre not the master..

Would definitely appreciate this feature as a way around this

@slackpad
Copy link
Contributor

Consul 0.6 added a "tag override" feature that's useful for implementing schemes like this, though the logic is run outside of Consul, not from Consul itself as suggested here. Here's the issue that brought it in #1102.

Here's a bit of the documentation, from https://www.consul.io/docs/agent/services.html:

The enableTagOverride can optionally be specified to disable the anti-entropy feature for this service. If enableTagOverride is set to TRUE then external agents can update this service in the catalog and modify the tags. Subsequent local sync operations by this agent will ignore the updated tags.

This would let an external agent like a script working with redis-sentinel to apply the tags to the current master via Consul's catalog API.

@mvanderlee
Copy link

+1 Would love to see this instead of the workaround with tag overriding.

@jcua
Copy link

jcua commented Apr 19, 2016

+1

1 similar comment
@PedroAlvarado
Copy link

+1

@onnimonni
Copy link

This is brilliant idea :), I would also want this for redis cluster!

@nickwales
Copy link
Contributor

nickwales commented May 17, 2016

+1 This would give us the ability to determine which application version should receive LB traffic in marathon.

@rafaelcapucho
Copy link

+1

1 similar comment
@tomwganem
Copy link

+1

avdva added a commit to avdva/consul that referenced this issue Sep 15, 2016
avdva added a commit to avdva/consul that referenced this issue Sep 15, 2016
avdva added a commit to avdva/consul that referenced this issue Sep 15, 2016
@avdva
Copy link

avdva commented Sep 16, 2016

Hi,
I've added support for dynamic tags here, branch dynamic-tags.
If you are interested in this feature, please build and test it, any critique is appreciated. If everything is ok, I'll make a PR.
The syntacs for service registration is the following:

{
    "service": {
        "name": "mongodb",
        "tags": ["tag1"],
        "dynamictags": [
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

@andremarianiello
Copy link

This feature would be great for my use case. I would really like to see this merged in eventually.

@slackpad slackpad added the theme/service-metadata Anything related to management/tracking of service metadata label May 2, 2017
@Sieabah
Copy link

Sieabah commented Jun 23, 2017

+1 Consul DNS even with the two service method takes 15 to 30 minutes to propagate in the UI, API, and DNS.

@slackpad
Copy link
Contributor

@Sieabah that sounds like a function of DNS caching some place - you can adjust the TTL value to maybe improve that. The API/UI shouldn't have any delay.

@Sieabah
Copy link

Sieabah commented Jun 23, 2017

@slackpad I have all of the DNS caching set to 0. Querying the API and ignoring the DNS takes about the same amount of resolution.

I'm sure there is something misconfigured as when I monitor the two boxes they're saying "synced service:mongo" and "synced service:primary-mongo". With the current service definition I'm able to get it to 5 minutes. During that time both services actually say they're the primary (in the UI and API) even when in the logs they switch immediately.

{
  "service": {
    "name": "primary-mongo",
    "tags": ["primary", "mongo"],
    "port": 27017,
    "check": {
      "name": "primary",
      "script": "python ~/consul_check_tags.py $(mongo --eval 'db.isMaster().ismaster' | grep 'true')",
      "interval": "5s",
      "timeout": "1s"
    }
  }
}

I've tried both reregistering via the API, reloading the config during the health check, reloading from the API. I don't know what is making it take 5 minutes to propagate to a cluster of 3 server and 2 client other than the anti-entropy timeout of syncing only every 1 minute?

@slackpad
Copy link
Contributor

@Sieabah we have a few issues we are looking into like #2970, but it may be worthwhile for you to open a new GH issue so we can try to track down what's happening to you. Better to do it on a different issue than this one.

@caquino
Copy link

caquino commented Aug 22, 2017

👍 this would simplify a lot of "workaround" we did to achieve this functionality to have master/slave tags

@adamlc
Copy link

adamlc commented Nov 6, 2017

Did we get anywhere with this? I'm looking for something similar at the moment where I have a service that has a master / slave type setup.

@ramukima
Copy link

Not sure if Prepared Queries can be used to apply such rules. However, dynamic tagging is a good idea. Any plans to get it in ?

@drawks
Copy link
Contributor

drawks commented Jan 7, 2019

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

@nicholasamorim
Copy link

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

@ShimmerGlass
Copy link
Contributor

This would be useful for us as well. What are your thoughts on how to design this ?
IMO a simple way would be to add a field like "output_as_tag": true to the check declaration struct. When set to true, the check output (as seen in Output in /v1/health/service/<service> query for example) would be captured and set as a tag, either on the node for a node check, or on the service for a service check.
If the value change, the previously set tag would be removed and the new one be added.
This tags would also be applied to sidecar services to ensure compatibility with Connect.

They are a few points to address tho :

  • If a command goes crazy and outputs 3Kb we probably don't want that as a tag, some filtering is probably needed.
  • We need to store that a tag comes from a check : if the output changes from "leader" to "follower", we need to remove the "leader" tag and add the "follower" tag. This needs to survive agent restart and be stored in the agent state files on disk, possibly requiring to update the file format.
  • This would also work for HTTP checks, especially since command checks can be dangerous (we have them disabled in our shop). Do we want the value for 50x's as tags ? I'd argue that we probably don't since in many cases 50x return a default "its not working" HTML, so dropping the tag in case of connection refused / http error codes would seem the most sensible to me.

@pierresouchay
Copy link
Contributor

@Aestek I think this kind of stuff would be, I agree really useful. For now we have lots of services such as:

  • mycluster-zookeeper (all green in normal circumstances)
  • mycluster-zookeeper-leader (which have the same members, but tests are not the same, meaning that mycluster-zookeeper-leader are always 4 instances in warning state and 1 in passing

Having a way to merge those services in 1 single service and just add a tag leader would be great.

I know several systems where checks for this kind of features can also be simple HTTP checks, so limiting it to scripts is a bit less interesting.
I am not convinced by scrapping the output of regular checks to get the new tags because:

  • If you can set several checks with this value if it goes against another check ?
  • How to you add/remove existing tags ?

The #1048 (comment) looks like a sensible approach (I mean, not linked to existing checks), because:

  • for each dynamic check, it describes explicitly what tag would be added
  • it avoid conflicts between outputs of several checks not agreeing on tags
  • its output would not change the output of checks
  • it would allow checks not only being script, but HTTP, TCP and so on

I did not check in details what has been done in https://github.com/avdva/consul/tree/dynamic-tags but it sounds to me like the right approach. While limiting the ability to have very dynamic things, it would greatly ease implementation (by avoiding conflicts on several checks most notably)

@avdva
Copy link

avdva commented Oct 14, 2019

I'll try to resurrect my branch soon, Will see, if it still works.

@pierresouchay
Copy link
Contributor

@avdva we are really interested by this, tell us when you do so ;)

@ShimmerGlass
Copy link
Contributor

@avdva Did you have time to resurrect your branch ? Hope it's not too complicated with all the conflict their must be since 2016

@RedStalker
Copy link

+1

@hanshasselberg
Copy link
Member

This is an interesting idea and I could imagine us adding such feature. The best way to get it in is to create a PR so that we have something to discuss. That would also make it easier to see the impact.

@exFalso
Copy link

exFalso commented Mar 10, 2020

+1. I think @ShimmerGlass's suggestion is great, the tag should come from the script itself. This covers OP's use case but would solve additional ones.
In our case, we have a dynamically generated ID in some of our services (the ID comes from dedicated hardware, and must be generated within the service), and it'd be great if we could propagate this ID to consul. A great way to solve this is to have a periodic script return the tag(s) to be applied.

@chris93111
Copy link

+1

1 similar comment
@EmPRio93
Copy link

EmPRio93 commented Oct 5, 2020

+1

@benvanstaveren
Copy link

I'll throw my 2 cents in (in 2021) and say that this feature would still be very much welcome because it solves at least 4 of my setups that are now relying on a bunch of external scripts to update tags.

@nbari
Copy link

nbari commented Oct 8, 2021

How does vault is doing it to mark the service as active, standby,initialized ? re-register the service?

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021
* Bump eks cleanup token duration to 3h from 1h

* Add slack notifications for failed cleanups
@blake
Copy link
Member

blake commented Jan 22, 2022

How does vault is doing it to mark the service as active, standby,initialized ? re-register the service?

Yes, Vault re-registers the service in order to update any parameters that changed since the last registration. See the reconcileConsul function in [vault/serviceregistration/consul/consul_service_registration.go](https://github.com/hashicorp/vault/blob/v1.9.2/serviceregistration/consul/consul_service_registration.go) for details.

@xenofree
Copy link

We are really interested by this, any chance to see this feature in future ?

@moonbaseDelta
Copy link

I cant scrutinize why the hell this feature missing for years while it's even been half-implemented already.

@sri4kanne
Copy link

Came across this issue while we were trying to implement similar solution from using other means. Will be a good to have enhancement for sure!!

@drawks
Copy link
Contributor

drawks commented Nov 19, 2024

This issue has now been open for almost a decade.

Can someone from hashicorp please indicate whether this is actually on anyone's radar for planning? It is the luke-warmest of feedback to give this issue the "thinking" tag after being open for a week and then not do ANYTHING with the request for 9 years even with people actually suggesting actual implementations. There is basically no feedback about why or why not this may be considered or rejected from inclusion in the project.

with nearly 50 participants in the issue you'd think that hashicorp could managed to at least have some meaningful engagement with the open source community.

I'm gonna tag some recent contributors with merge/commit access as well as some execs to maybe prompt some actual engagement with this issue from anyone @sarahalsmiller @rboyer @xwa153 @dhiaayachi @jmurret @Amier3 @armon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/service-metadata Anything related to management/tracking of service metadata thinking More time is needed to research by the Consul Contributors
Projects
None yet
Development

No branches or pull requests