-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-Cluster Service Connectivity Fails with "Host is Unreachable" Despite Successful DNS Resolution in Submariner GlobalNet Setup #3204
Comments
Thanks for reaching out @aswinayyolath. A. As mentioned in Slack discussion, inter-cluster libreswan tunnel is up and communication between gw nodes is fine while communication from non-GW node to gw node is failing. further dapath investigation is needed here, I assume that for some reason (maybe infra firewall, connection tracking) ingress packet is being dropped in gwnode@clusterX to nongwnode@clusterX segment. Can you please run ping from non-gw node@sub1 to gw-node@sub2 (for gw-node@sub2 IP address you should use endpoint healthcheck IP == 242.0.255.254) and tcpdump the gw node and non-gw node on cluster sub1 ? B. Also, this is not relevant to datapath issue, but I noticed that Submariner detected the CNI as generic instead of flannel, Submariner uses this code to discover network details for flannel CNI. |
DaemonSet List:
Checked Pods
CNI Configuration
|
Is there flannel daemonset in another namespace? |
Yes
|
The kube-flannel-ds DaemonSet has the following volumes volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni-plugin
hostPath:
path: /opt/cni/bin
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
CM details
apiVersion: v1
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"EnableNFTables": false,
"Backend": {
"Type": "vxlan"
}
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"cni-conf.json":"{\n \"name\": \"cbr0\",\n \"cniVersion\": \"0.3.1\",\n \"plugins\": [\n {\n \"type\": \"flannel\",\n \"delegate\": {\n \"hairpinMode\": true,\n \"isDefaultGateway\": true\n }\n },\n {\n \"type\": \"portmap\",\n \"capabilities\": {\n \"portMappings\": true\n }\n }\n ]\n}\n","net-conf.json":"{\n \"Network\": \"10.244.0.0/16\",\n \"EnableNFTables\": false,\n \"Backend\": {\n \"Type\": \"vxlan\"\n }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","k8s-app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-flannel"}}
creationTimestamp: "2024-11-03T16:24:55Z"
labels:
app: flannel
k8s-app: flannel
tier: node
name: kube-flannel-cfg
namespace: kube-flannel
resourceVersion: "282"
uid: f0058e4b-4ba9-49be-a759-fd0c9843a88d
|
Thanks for the information, Regarding flannel discovery, it looks like we need to update flannel discovery code. QQ: does Could you please report a new issue for flannel CNI discovery? please attach relevant information, we welcome any code contribution here :-) . As per the datapath issue, traffic initiated at nongw node@clusterA towards remoter cluster is encapsulated in VxLAN (port 4800, interface vx-submariner) towards gw node@clusterA and gw node should forward it to remote cluster gw. Can you double check (maybe use tcpdump -pi ) that no packet is sent in nonGW node ? I can see that on gw node iptables (filter table) packet counter for input traffic on vx-submariner interface is > 0 , check [1] . [1] |
QQ: does kubectl get ds -A -l k8s-app=flannel return flannel ds ?
I will report a new issue and see if I can contribute (I guess changes should be relatively small) ... Packet Transmission on the Non-GW Node in sub1 (ClusterA)
Run Packet Capture on the Non-Gateway Node in ClusterA
Verified Reception on the Gateway Node in ClusterA
is this what you want me to do? I am not 100% sure |
I have created a new issue: #3210. A draft change has been pushed here: #3268. @yboaron, I haven't yet looked into linting, unit tests, or e2es testing; I'm just checking if the changes look something like this (Draft linked above). I also modified the loop structure from for k := range daemonsets.Items {
if strings.Contains(daemonsets.Items[k].Name, "flannel") {
volumes = daemonsets.Items[k].Spec.Template.Spec.Volumes to for _, ds := range daemonsets.Items {
if strings.Contains(ds.Name, "flannel") {
flannelDaemonSet = &ds
volumes = ds.Spec.Template.Spec.Volumes
break
}
} to enhance code readability and clarity. I think thid approach makes it clear that ds represents a DaemonSet obj, eliminating the need for indexing. Additionally, by storing a pointer to the found DS and breaking the loop upon finding it, I believe if we do something like this the code becomes more efficient and reduces the risk of errors associated with accessing elements via an index. |
Submariner only handles egress routing and only for packets destined to remote clusters (dest IP is from remote pod,service CIDRs, in your case it is globalNet CIDR for remote cluster) , please tcpdump while pinging remote endpoint healthcheck IP address |
Can you try running |
I am seeing a lot of output from
|
Hmmmm, its ICMP/IPv6 traffic , don't you get any ICMP/IPv4 ( |
|
Hmm, strange, can't see the V4 icmp sent to remote cluster. |
I don't have the cluster with me 😔. But I will create one (in fact 2). @yboaron I would like to check with you if the steps I am following is correct or not. Could you please review the Steps here (https://kubernetes.slack.com/archives/C010RJV694M/p1730390398271589?thread_ts=1730390376.380879&cid=C010RJV694M) and let me know If I am missing anything please? |
I would also like to test the same in AWS across 2 regions. I just want to know if the steps I followed is correct and I will try it in Both the VM I used before as well as I will create 2 EKS cluster in 2 diff regions in AWS and see if that works |
Yep, looks fine. Can you try reinstalling without adding --globalnet-cidr 242.0.0.0/16 flag in subctl join command for both clusters |
Hello @yboaron. Since @aswinayyolath is busy with some other tasks, I'm looking at this issue. We're on the same team working on the same project. Since we have same CIDRs on our K8s clusters we cannot have submariner run without global net. To counter this we created an AWS account and then tried to run submariner on EKS. But this does not work and gives these outputs while running diagnostics
I suspect that there is some issue with setting up the subnets. What is something I should try next to get submariner up and running on AWS? |
Maybe you can follow this link ? In case deployment fails please attach debug details from clusters (subctl gather , subctl diagnose all ) ? |
What happened:
I deployed Submariner with GlobalNet across two Kubernetes clusters. DNS resolution works as expected, but connectivity to services across clusters fails with a
Host is unreachable
error.More info is available in below link
https://kubernetes.slack.com/archives/C010RJV694M/p1730390376380879
What you expected to happen:
curl
requests from a pod in cluster2 to a service exposed via Submariner in cluster1 should succeed, indicating that cross-cluster communication is functioning.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
subctl diagnose all
):Cluster 1 info
Cluster 2 info
subctl gather
):sub1.zip
sub2.zip
K8S is installed on ubuntu VM
OS INFO
The text was updated successfully, but these errors were encountered: